Distinguishing different modes of growth using single-cell data

Elife. 2021 Dec 2:10:e72565. doi: 10.7554/eLife.72565.

Abstract

Collection of high-throughput data has become prevalent in biology. Large datasets allow the use of statistical constructs such as binning and linear regression to quantify relationships between variables and hypothesize underlying biological mechanisms based on it. We discuss several such examples in relation to single-cell data and cellular growth. In particular, we show instances where what appears to be ordinary use of these statistical methods leads to incorrect conclusions such as growth being non-exponential as opposed to exponential and vice versa. We propose that the data analysis and its interpretation should be done in the context of a generative model, if possible. In this way, the statistical methods can be validated either analytically or against synthetic data generated via the use of the model, leading to a consistent method for inferring biological mechanisms from data. On applying the validated methods of data analysis to infer cellular growth on our experimental data, we find the growth of length in E. coli to be non-exponential. Our analysis shows that in the later stages of the cell cycle the growth rate is faster than exponential.

Keywords: E. coli; data analysis; infectious disease; linear regression; mathematical model; microbial growth; microbiology; physics of living systems.

Plain language summary

All cells – from bacteria to humans – tightly control their size as they grow and divide. Cells can also change the speed at which they grow, and the pattern of how fast a cell grows with time is called ‘mode of growth’. Mode of growth can be ‘linear’, when cells increase their size at a constant rate, or ‘exponential’, when cells increase their size at a rate proportional to their current size. A cell’s mode of growth influences its inner workings, so identifying how a cell grows can reveal information about how a cell will behave. Scientists can measure the size of cells as they age and identify their mode of growth using single cell imaging techniques. Unfortunately, the statistical methods available to analyze the large amounts of data generated in these experiments can lead to incorrect conclusions. Specifically, Kar et al. found that scientists had been using specific types of plots to analyze growth data that were prone to these errors, and may lead to misinterpreting exponential growth as linear and vice versa. This discrepancy can be resolved by ensuring that the plots used to determine the mode of growth are adequate for this analysis. But how can the adequacy of a plot be tested? One way to do this is to generate synthetic data from a known model, which can have a specific and known mode of growth, and using this data to test the different plots. Kar et al. developed such a ‘generative model’ to produce synthetic data similar to the experimental data, and used these data to determine which plots are best suited to determine growth mode. Once they had validated the best statistical methods for studying mode of growth, Kar et al. applied these methods to growth data from the bacterium Escherichia coli. This showed that these cells have a form of growth called ‘super-exponential growth’. These findings identify a strategy to validate statistical methods used to analyze cell growth data. Furthermore, this strategy – the use of generative models to produce synthetic data to test the accuracy of statistical methods – could be used in other areas of biology to validate statistical approaches.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cell Cycle / physiology*
  • Cell Division / physiology*
  • Cell Enlargement*
  • Cell Proliferation / physiology*
  • Data Interpretation, Statistical
  • Escherichia coli / growth & development*
  • Models, Theoretical*

Associated data

  • figshare/10.6084/m9.figshare.c.3493548.v1