The cell cycle dependent transcriptome and proteomeThe cell cycle is an ordered and tightly regulated series of events, over which the cell grows and divides into two daughter cells. It consists of four stages, during which the cell increases in size (G1), replicates its genome (S), increases further in size and prepares for mitosis (G2), and finally goes through mitosis as well as cytokinesis (M). Depending on external and internal signals, the cell may also exit the replicative cell cycle from G1 and enter a non-replicative resting state (G0). Dysregulation of the cell cycle is known to have devastating consequences, such as uncontrolled cell proliferation, genomic instability (Malumbres M et al. (2009)), and cancer (Massagué J. (2004); Hartwell LH et al. (1994)). Therefore, the cell cycle needs to be tightly controlled, while at the same time remaining responsive to various intracellular and extracellular signals (Barnum KJ et al. (2014)). The cell cycle control system involves an intricate network of proteins that are tightly regulated by mechanisms such as transcriptional regulation (Weinberg RA. (1995)), protein post-translational modifications (PTMs) (Morgan DO. (1995)), and protein degradation (Teixeira LK et al. (2013); King RW et al. (1996)). In asynchronous cell cultures, the cell cycle is a fundamental source of cell-to-cell variation in both transcript and protein abundances (Cho RJ et al. (2001); Whitfield ML et al. (2002); Boström J et al. (2017); Lane KR et al. (2013); Ohta S et al. (2010); Ly T et al. (2014); Pagliuca FW et al. (2011); Ly T et al. (2015)). The Cell Atlas provides a resource to explore protein heterogeneity at the single cell level in unperturbed log-phase growing cells. Among the 12813 genes in the Cell Atlas, a quarter (25%, n=3141) show cell-to-cell variation in terms of expression level and/or spatial distribution of the encoded protein(s) in at least one cell line. For 1149 of these, the temporal protein and RNA expression patterns have been further characterized in individual cells using the Fluorescent Ubiquitination-based Cell Cycle Indicator (FUCCI) U-2 OS cell line (Mahdessian D, Cesnik AJ et al. (2021)). In total, there are 529 genes encoding proteins identified to be cell cycle dependent (CCD), including 222 in mitotic structures, 318 in interphase and 11 in both. Furthermore, among the 13450 genes found to be expressed in FUCCI U-2 OS, there are 401 genes encoding CCD transcripts. This spatially resolved proteomic map of the cell cycle has been integrated into the Cell Atlas in order to provide a resource for molecular insights into the human cell cycle and cellular proliferation. Single-cell variation in the Cell AtlasGenetically identical cells may exhibit differences in their patterns of gene- and protein expression. This phenomenon is often referred to as cell-to-cell variation or single-cell variation (SCV). While it is hypothesized that there is an underlying functional importance to this variability, the scale and significance of variations at the single-cell level remains poorly understood (Dueck H et al. (2016)). Environmental changes, DNA damage, cell cycle progression, and stochasticity are examples of factors that may cause changes in RNA and protein expression within isogenic cell populations, and thus serve as sources of single-cell heterogeneity (Snijder B et al. (2011)). This may create different phenotypic characteristics within individual cells and provide them with a molecular and phenotypic fingerprint. Identification of all human proteins that display single-cell variation lays a foundation for characterizing the driving forces of single-cell heterogeneity, and for understanding the functional consequences. In an immunofluorescence (IF) image, single-cell protein variations can be observed as differences in the staining intensity or spatial distribution between cells, as exemplified in Figure 1. Interestingly, as many as 25% (n=3141) of all human proteins localized in the Cell Atlas show single-cell variations (Thul PJ et al. (2017)). Of these, 2959 proteins show variations in expression level (staining intensity), and 211 proteins show variations in spatial distribution.
Figure 1. Examples of proteins showing single-cell variation. GTPBP8 is a GTP binding protein (detected in U-2 OS cells). CLCN6 is a chloride transport protein (detected in U-2 OS cells). INCENP is a component of the chromosomal passenger complex (CPC) that is a key regulator of mitosis (detected in MCF7). RACGAP1 plays key roles in controlling cell growth and cell division (detected in U-2 OS cells). RRM2 provides precursors necessary for DNA synthesis (detected in U-2 OS cells). KIF20A is a mitotic kinesin required for cytokinesis (detected in U-2 OS cells). DUSP18 and DUSP19 are phosphatases (detected in A-431 and SK-MEL-30 cells, respectively). CCNB1 is a key regulator of the cell cycle at the G2/M transition for cell division (detected in U-2-OS cells). The target protein is shown in green, microtubules in red, and the nucleus in blue. Single-cell variation is most commonly observed for proteins localized to the nucleus, cytosol, nucleoli and mitochondria (Figure 2). Gene Ontology (GO)-based enrichment analysis of genes encoding proteins with single-cell variation at protein level reveals an enrichment of GO terms describing processes associated with cellular responses to various extracellular stimuli, apoptosis, cell differentiation, cell cycle progression and metabolism (Figure 3). Figure 2. Localizations of proteins showing single-cell variations to the different organelles, grouped by meta-compartments.
Figure 3. Gene Ontology-based enrichment analysis for genes encoding proteins with single-cell variations, showing the significantly enriched terms for the GO domain Biological Process. Each bar is clickable and gives a search result of proteins that belong to the selected category. Interphase proteogenomics in single cellsPrevious studies of transcript and protein abundance in different phases of the human cell cycle have revealed variations in the expression of 400-1,200 genes (Cho RJ et al. (2001); Whitfield ML et al. (2002); Boström J et al. (2017)) and 300-700 proteins (Lane KR et al. (2013); Ohta S et al. (2010); Ly T et al. (2014); Pagliuca FW et al. (2011); Ly T et al. (2015)). However, cell synchronization is known to alter gene expression (Cooper S et al. (2007)), cell morphology and metabolism (Davis PK et al. (2001)), and precludes the discovery of expression changes within cell cycle phases. The use of single-cell RNA sequencing has allowed the analysis of transcriptional changes without the need for synchronization and has enabled the discovery of additional cell cycle regulated genes (Domenighetti G et al. (1988); Scialdone A et al. (2015)). However, studies of cell cycle dependent (CCD) variations in protein expression at single-cell level have been lacking due to technological limitations. The HPA Cell Atlas now includes a targeted single-cell transcriptomic analysis, as well as proteomic imaging (i.e., imaging proteogenomics, Figure 4) of 1149 proteins that show single-cell variability in the Cell Atlas and that are expressed in FUCCI U-2 OS cells (Sakaue-Sawano A et al. (2008)). This cell line expresses a pair of fluorescently tagged marker proteins, Cdt1 tagged with red fluorescent protein (RFP) and Geminin tagged with green fluorescent protein (GFP), which enable visualization of interphase progression in individual cells. The intensities of the RFP- and GFP-tagged cell cycle markers can be used to create a linear representation of cell cycle pseudo time, enabling protein and RNA expression in individual cells to be plotted along an axis representing progression through interphase.
Figure 4. Schematic overview of the single-cell imaging proteogenomic workflow. U-2 OS FUCCI cells express two fluorescently tagged cell cycle markers, CDT1 during G1 phase (red, RFP-tagged) and Geminin during S and G2 phases (green, GFP-tagged); these markers are co-expressed during the G1-S transition (yellow). By fitting a polar model to the red and green fluorescence intensities, a linear representation of cell cycle pseudotime is obtained. Independent measurements of RNA and protein expression are compared after pseudotime alignment of individual cells. The single-cell RNA-sequencing data from the FUCCI U-2 OS cells enables analysis of RNA abundance in relation to cell cycle progression. Upon analysis of 13,450 protein-coding genes expressed in FUCCI U-2 OS, 401 genes (3%) show variance in RNA expression levels that correlate to cell cycle progression. For the single-cell proteomic imaging analysis, 318 proteins display variation in protein expression levels that temporally correlate with interphase progression through G1, S and G2. The cell cycle dependent (CCD) proteins include known cell cycle regulators, such as the cyclin CCNB1 and ANLN, which is required for cytokinesis, but also novel CCD proteins, such as SCIN and DUSP18 (Figure 5). However, most proteins (831) show cell-to-cell variations that are largely unexplained by cell cycle progression (non-CCD). This opens up intriguing avenues for further exploration of the stochasticity or deterministic factors that govern these variations, as well as the role of spatiotemporal proteome dynamics for regulating other cellular states and functions.
Figure 5. Examples of temporal expression profiles for single cell protein (blue) and RNA (orange) expression. The boxplot shows a mock-up bulk proteomic experiment. Proteins in mitotic structuresIn addition to proteins that show single-cell variations due to progression through interphase, there are 222 cell cycle dependent (CCD) proteins in the Cell Atlas that localize to mitotic structures, including mitotic chromosomes (55), mitotic spindle (38), kinetochores (4), cytokinetic bridge (81), midbody (34), midbody ring (14) and cleavage furrow (2) (Figure 6).
Figure 6. Example images of proteins localized to mitotic substructures: KIF20A to cleavage furrow, TAF1D, TACC3, KIF11 and CKAP2Lto mitotic spindle, BIRC5 to cytokinetic bridge, DVL3 and CTTNBP2 to midbody ring, and SGO1 to kinetochores. Localizations of the cell cycle dependent proteomeIn total, there are 529 genes encoding variable proteins that have been identified as cell cycle dependent (CCD) and 831 genes encoding variable proteins that have been identified as cell cycle independent (non-CCD) in the Cell Atlas. The high resolution of the HPA Cell Atlas dataset allows us to look at the subcellular localizations of proteins showing CCD and non-CCD variability in protein expression. One notable observation is that both categories are enriched for proteins localizing to the nucleoli (Figure 7). Non-CCD variable proteins are also enriched for proteins localizing to intermediate filaments, nuclear bodies, and mitochondria, while CCD variable proteins are enriched in mitotic structures. Almost half of the CCD variable proteins reside in the nuclear meta compartment, including the nucleus, nuclear speckles, nuclear bodies, and nucleoli. This is in agreement with one of the main functions performed in the nucleus being replication and separation of DNA during the cell cycle.
Figure 7. Bar plot showing the subcellular localizations enriched for CCD proteins (blue) and non-CCD proteins (grey) relative to the proteome mapped in the HPA. Temporal delay between RNA and proteinPrevious studies have shown that many RNAs peak in expression in the G1 phase, which is also the longest period of the cell cycle (Boström J et al. (2017); Grant GD et al. (2013)). Among the 401 genes for which RNA expression is correlated to the cell cycle in FUCCI U-2 OS cells, 44% (175) peak in G1. However, most proteins that show cell cycle dependent expression (247) peak towards the end of the cell cycle, corresponding to late S and G2 (Figure 8).
Figure 8. The number of proteins peaking in each phase (interactive blue text) and the number of transcripts peaking in each phase (interactive orange text). Interestingly, only 73 (14%) of the genes encoding proteins identified as CCD proteins also display cell cycle dependent variations in RNA expression (Figure 9). For these genes, we observed an average of 7.7 hours delay between the peaks of RNA and protein expression. However, a large majority of the CCD proteins have non-CCD transcripts, and thus their variation in protein expression thus cannot be attributed to transcript cycling. The small overlap of CCD proteins and transcripts is corroborated by external RNA datasets (Grant GD et al. (2013); Semple JW et al. (2006)) and indicates that the temporal dynamics of proteome regulation may be largely maintained at a post-transcriptional level.
Figure 9. The numbers of cell cycle dependent proteins, transcripts, displayed as an interactive bar plot on the left. On the right, we highlight the overlap of these categories as transcriptionally regulated and non-transcriptionally regulated cell cycle dependent proteins as an interactive bar plot. Functional role of novel cell cycle proteins in proliferationAnalysis of RNA expression of the CCD proteins across normal human tissues and tumor tissues, reveals a significantly higher expression in proliferative tissues compared to non-proliferative tissues (Figure 10). This indicates that, while the majority of the CCD proteins are not accompanied by cycling transcripts, overall transcription levels of these proteins could be important for cell proliferation.
Figure 10. A) Hierarchical clustering of bulk transcript expression (log-transformed TPM values) for CCD proteins derived from RNA sequencing of various normal and cancer tissue types. The expression levels of the proliferation markers MCM6, CDK1, PCNA, MCM2 and KI67 are highlighted on top as a general measure of the proliferative activity of the tissues. Four clusters are identified: (1) contains normal tissues with low proliferative activity, (2) contains cerebral tissues with testis, (3) contains mostly normal tissues with midrange expression level of the proliferation markers and (4) contains tissues with high expression of the proliferation markers, including tumors. B) Box plots of the average transcript level for known and novel CCD proteins, respectively, for the four different clusters from A. To confirm a functional role in proliferation, we performed siRNA-mediated gene silencing for a few selected novel CCD proteins. Silencing of DUSP18, KLHL38, CD2BP2 and SOX12 decreased cell proliferation rate relative to a control, whereas silencing of JPH3 increased cellular proliferation (Figure 11).
Figure 11. Silencing of CCD proteins SOX12, DUSP18, CD2BP2, KLHL38 and JPH3. Immunofluorescence images of the control and siRNA samples observed with LUT show variation in protein expression from low (blue) to high intensity (white). Bar plots show the differences in cell counts for control (Ctrl) and siRNA (si) samples, and boxplots show the significant decrease of the measured intensity (too few cells were observed in DUSP18 and KLH38 siRNA samples to make this comparison). Cellular proliferation also plays an important role in tumorigenesis. The Pathology Atlas of the HPA is a comprehensive resource for studying the correlation between RNA expression for human protein-coding genes in cancer tissues and the clinical outcomes for almost 8000 cancer patients. Prognostic associations are significantly overrepresented among the genes encoding CCD proteins (370, 70%), corroborating the functional role of CCD proteins in proliferation. The novel CCD proteins, such as FAM50B and CD2BP2 (Figure 12), include both inhibitors and enhancers of proliferation, with potential anti-oncogenic or oncogenic functions. Thus, some of the novel CCD proteins may have potential to be novel diagnostic or therapeutic targets for human cancers.
Figure 12. Kaplan-Meier plots showing the correlation between survival and gene expression (FPKM) for FAM50B (top panel) and CD2BP2 (bottom). Higher expression of FAM50B was associated with longer survival (favorable) in renal cancer, and higher expression of CD2BP2 was associated with shorter survival (unfavorable) in liver cancer. Immunohistochemistry images (target protein: brown, nuclei: blue) show lower expression of FAM50B in renal cancer than normal kidney and higher expression of CD2BP2 in liver cancer than normal liver. Relevant links and publicationsMahdessian D, Cesnik AJ et al. Spatiotemporal dissection of the dell cycle with single-cell proteogenomics. Nature (2021) Parikh K et al., Colonic epithelial cell diversity in health and inflammatory bowel disease. Nature. (2019) |