The human tissue specific proteome
All, approximately 20000, human genes are classified according to their expression across all major organs and tissue types in the human body. Few of the genes are strictly tissue specific, however, the genes with an elevated expression in particular tissues are interesting as a starting point to understand their biology and function, and underlying mechanisms for disease.
- A total of 11069 genes are elevated in at least one of the analyzed tissues of which:
- 2845 are tissue enriched genes
- 1637 are group enriched genes
- 6587 are enhanced genes
Transcriptome analysis of all major organs and tissue types in the human body can be visualized with regard to specificity and distribution of transcribed mRNA molecules across all putative 19670 protein coding genes (Figure 1). Specificity illustrates the number of genes with elevated or non-elevated expression in a particular tissue compared to other tissues. The analysis includes 11069 genes, and 8385 genes with low tissue specificity (read more in The housekeeping proteome). Elevated expression includes three subcategory types of elevated expression:
- Tissue enriched: At least four-fold higher mRNA level in a particular tissue compared to any other tissue.
- Group enriched: At least four-fold higher average mRNA level in a group of 2-5 tissues compared to any other tissue.
- Tissue enhanced: At least four-fold higher mRNA level in a particular tissue compared to the average level in all other tissues.
Distribution, on the other hand, visualizes how many genes that have, or do not have, detectable levels (NX≥1) of transcribed mRNA molecules. As evident in Table 1, all elevated genes are categorized as:
- Detected in single: Detected in a single tissue
- Detected in some: Detected in more than one but less than one third of tissues
- Detected in many: Detected in at least a third but not all tissues
- Detected in all: Detected in all tissues
Figure 1. (A) The distribution of all genes across the five categories based on transcript specificity in all 37 analyzed tissues. (B) The distribution of all genes across the six categories based on transcript detection (NX≥1) in all 37 analyzed tissues.
Table 1. Number of genes in the subdivided categories of elevated expression in all 37 analyzed tissues.
The amount of tissue elevated genes is highly variable between the analyzed tissue types (see Table 2 below). Testis shows the highest number of tissue enriched genes (n=950), followed by the brain (n=488) and liver (n=242). When taking into consideration all tissue elevated genes, the brain however has a slightly higher number than the testis. The large number of enriched genes in testis is considered to be due to the highly specialized processes occurring during spermatogenesis. Many of these genes likely have a shared expression with oocytes in the female ovaries. Ocytes are however difficult to analyze because of the complex kinetics of female germ cell development, including first rounds of meiosis, which in females occur at the embryonic stage. As expected, tissues that have similar functions and morphology often have higher numbers of shared group enriched genes.
In addition to previously known proteins, the analysis also identified a large number of genes with tissue elevated expression patterns that were previously poorly characterized and with no or only scarce evidence of existence at protein level. The combined RNA and antibody-based profiling can thus be used to confirm the physiological functions of such protein coding genes lacking previous annotation. These proteins are interesting starting points for further in-depth studies to gain a better understanding of the molecular mechanisms of the various cellular phenotypes that define the function of each respective tissue and organ.
Table 2. Tissue elevated genes.
Tissue elevated genes
The comprehensive analysis presented here has identified 11069 human genes that display a tissue elevated expression pattern across the human body. By combining the analysis with antibody-based protein profiling using immunohistochemistry, the exact location of the corresponding protein expression pattern at a cellular and subcellular level can be provided. Examples of protein expression patterns of tissue elevated genes are presented below.
Brain
- GFAP (Glial fibrillary acidic protein) - astrocyte intermediate filament protein
- MBP (Myelin basic protein) - a major constituent of the myelin sheath
- ELAVL3 (ELAV like RNA binding protein 3) - neural-specific RNA-binding protein
GFAP - cerebral cortex
MBP - hippocampus
ELAVL3 - cerebral cortex
Retina
- RHO (Rhodopsin) – involved in phototransduction in rod photoreceptors
- ARR3 (Arrestin 3) – involved in phototransduction in cone photoreceptors
RHO - retina
ARR3 - retina
Endocrine tissues
- FSHB (Follicle stimulating hormone beta subunit) – hormone inducing egg and sperm production
- TG (Thyroglobulin) - substrate for the synthesis of thyroid hormones
- HSD3B2 (Hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid delta-isomerase 2) - involved in the biosynthesis of hormonal steroids
FSHB - pituitary gland
TG - thyroid gland
HSD3B2 - adrenal gland
Lung
- SFTPA1 (Surfactant protein A1) - involved in surfactant homeostasis and the defense against respiratory pathogens
- SFTPB (Surfactant protein B) - involved in surfactant homeostasis and the defense against respiratory pathogens
SFTPA1 - lung
SFTPB - lung
Proximal digestive tract
- STATH (Statherin) - inhibits precipitation of calcium phosphate salts in the saliva
- KRT4 (Keratin 4) - expressed in differentiated layers of mucosal and esophageal epithelia
STATH - salivary gland
KRT4 - esophagus
Gastrointestinal tract
- PGA4 (Pepsinogen 4, group I (pepsinogen A)) - enzyme for digestion of dietary proteins
- DEFA5 (Defensin alpha 5) - antimicrobial and cytotoxic peptide involved in host defense
- KRT20 (Keratin 20) - maintains keratin filament organization in intestinal epithelia
PGA4 - stomach
DEFA5 - duodenum
KRT20 - colon
Liver & gallbladder
- ALB (Albumin) - plasma protein
- CYP2A13 (Cytochrome P450 member) - involved in drug metabolism, cholesterol and steroid synthesis
- CHST4 (Carbohydrate sulfotransferase 4) - enzyme involved in the modification of glycan structures
ALB - liver
CYP2A13 - liver
CHST4 - gallbladder
Pancreas
- AMY2A (Amylase, alpha 2A) - an enzyme that digests carbohydrates, secreted by exocrine cells
- INS (Insulin) - involved in lowering of blood glucose, secreted by beta cells
- GCG (Glucagon) - involved in the elevation of blood glucose, secreted by alpha cells
AMY2A - pancreas
INS - pancreas
GCG - pancreas
Kidney & urinary bladder
- SLC22A13 (Solute carrier family 22 member 13) - membrane-bound organic anion transporter
- NPHS2 (Podocin) - involved in the regulation of glomerular permeability
- UPK2 (Uroplakin 2) - membrane protein preventing cell rupture during bladder distention
SLC22A13 - kidney
NPHS2 - kidney
UPK2 - urinary bladder
Male tissues
- DMRT1 (Doublesex- and mab-3-related transcription factor 1) - involved in meiosis
- SEMG1 (Semenogelin I) - predominant protein in semen
- KLK3 (Kallikrein related peptidase 3) - also called PSA, used clinically to diagnose prostate cancer
DMRT1 - testis
SEMG1 - seminal vesicle
KLK3 - prostate
Female tissues
- CSH1 (Chorionic somatomammotropin hormone 1 ) - hormone important for growth control during pregnancy
- OVGP1 (Oviductal glycoprotein 1) - mucus protein important in mucociliary transport of the fertilized ovum
- MUM1L1 (MUM1 like 1) - a protein with a mutated melanoma-associated antigen 1 domain, associated with cancer
CSH1 - placenta
OVGP1 - fallopian tube
MUM1L1 - ovary
Muscle tissues
- TNNI3 (Troponin I3, cardiac type) - mediates muscle relaxation
- TNNT2 (Troponin T2, cardiac type) - mediates muscle contraction
- MYH7 (Myosin heavy chain 7) - expressed in slow type I muscle fibers
TNNI3 - heart muscle
TNNT2 - heart muscle
MYH7 - skeletal muscle
Adipose & soft tissue
- FABP4 (Fatty acid binding protein 4) - involved in fatty acid uptake, transport, and metabolism
- PLIN1 (Perilipin 1) - coats lipid storage droplets in adipocytes
FABP4 - adipose tissue (soft tissue)
PLIN1 - adipose tissue (breast)
Skin
- KRT1 (Keratin 1) - involved in squamous differentiation and skin barrier function
- KRT27 (Keratin 27) - plays a role in hair formation
- CASP14 (Caspase 14) - involved in keratinocyte differentiation and cornification
KRT1 - skin
KRT27 - hair
CASP14 - skin
Bone marrow & lymphoid tissues
- MPO (Myeloperoxidase) - major component of neutrophil azurophilic granules
- CD8B (CD8b molecule) - plays a critical role in thymic selection of CD8+ T-cells
- CD22 (CD22 molecule) - mediates interactions between B-cells
MPO - bone marrow
CD8B - thymus
CD22 - lymph node
Group enriched proteins
The 1637 genes identified as group enriched reflect genes with shared expression in 2-5 tissues. Many of these genes encode proteins that are expressed in cell types that have similar functions across several tissues, such as proteins expressed in immune cells (present in many organs but especially lymphoid tissues and the gastrointestinal tract) tissues), proteins involved in squamous cell differentiation (e.g. cervix, esophagus and skin), glandular cell function in the gastrointestinal tract (duodenum, small intestine and colon) or cilia movement (testis and fallopian tube). The schematic network plot below shows the distribution between group enriched genes in different tissues.
Figure 2. An interactive network plot of the tissue enriched and group enriched genes connected to their respective enriched tissues (grey circles). Red nodes represent the number of tissue enriched genes and orange nodes represent the number of genes that are group enriched. The sizes of the red and orange nodes are related to the number of genes displayed within the node. Each node is clickable and results in a list of all enriched genes connected to the highlighted edges. The network is limited to group enriched genes in combinations of up to 3 tissues, but the resulting lists show the complete set of group enriched genes in the particular tissue.
Immune cells can be found in both lymphoid organs and organs infiltrated by immune cells, such as the intestine. Consequently, genes important for immune cell function are often enriched in both lymphoid tissues and the intestine. One such gene is MS4A1, which encodes CD20, an activated-glycosylated phosphoprotein expressed on the surface of B-cells beginning at the pro-B phase with progressively increasing concentrations until maturity.
MS4A1 - lymph node
MS4A1 - appendix
MS4A1 - small intestine
Squamous epithelia are found in many parts of the body as dry skin or wet mucosa, acting as a robust barrier against various chemical and mechanical stresses. Desmocollin 3, DSC3, encoding a protein important in cell-cell junctions and cellular adhesion, is group enriched in squamous epithelia, such as the esophagus and skin exemplified below.
DSC3 - esophagus
DSC3 - skin
Mucus has several functions in the body related to transportation and barrier functions. The function of the mucus in the salivary gland is related to food and pathogens, while the mucus in the cervix is involved in for example transportation and blockage of sperm during sexual reproduction. MUC16 is a mucus component and is group enriched in both the mucus-producing salivary gland and cervix.
MUC16 - salivary gland
MUC16 - cervix
The fallopian tube shares many elevated genes with testis. The common denominator is the utilization of cilia, or the structurally similar flagellum, for essential organ functions. DNAI2, a dynein protein, constitutes a motor protein component of motile cilia of multiciliated cells as well as the flagellum (tail) of the sperm. By pulling on the microtubule structure of the cilium/flagellum, the motor protein creates motion and in the case of the sperm, sperm motility. In the immunohistochemistry images below, expression of DNAI2 can be seen in a subset of cilia in the fallopian tube (left and middle image), as well as in the flagellum of spermatids and cytoplasm of differentiating spermatocytes (right image).
DNAI2 - fallopian tube
DNAI2 - fallopian tube ciliated cells
DNAI2 - testis
Relevant links and publications
Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419
Bergman J et al., The human adrenal gland proteome defined by transcriptomics and antibody-based profiling. Endocrinology. (2016)
PubMed: 27901589 DOI: 10.1210/en.2016-1758
Edqvist PH et al., Expression of human skin-specific genes defined by transcriptomics and antibody-based profiling. J Histochem Cytochem. (2015)
PubMed: 25411189 DOI: 10.1369/0022155414562646
Lindskog C et al., The human cardiac and skeletal muscle proteomes defined by transcriptomics and antibody-based profiling. BMC Genomics. (2015)
PubMed: 26109061 DOI: 10.1186/s12864-015-1686-y
Sjöstedt E et al., Defining the Human Brain Proteome Using Transcriptomics and Antibody-Based Profiling with a Focus on the Cerebral Cortex. PLoS One. (2015)
PubMed: 26076492 DOI: 10.1371/journal.pone.0130028
Zieba A et al., The Human Endometrium-Specific Proteome Defined by Transcriptomics and Antibody-Based Profiling. OMICS. (2015)
PubMed: 26488136 DOI: 10.1089/omi.2015.0115
O'Hurley G et al., Analysis of the Human Prostate-Specific Proteome Defined by Transcriptomics and Antibody-Based Profiling Identifies TMEM79 and ACOXL as Two Putative, Diagnostic Markers in Prostate Cancer. PLoS One. (2015)
PubMed: 26237329 DOI: 10.1371/journal.pone.0133449
Habuka M et al., The Urinary Bladder Transcriptome and Proteome Defined by Transcriptomics and Antibody-Based Profiling. PLoS One. (2015)
PubMed: 26694548 DOI: 10.1371/journal.pone.0145301
Andersson S et al., The transcriptomic and proteomic landscapes of bone marrow and secondary lymphoid tissues. PLoS One. (2014)
PubMed: 25541736 DOI: 10.1371/journal.pone.0115911
Habuka M et al., The kidney transcriptome and proteome defined by transcriptomics and antibody-based profiling. PLoS One. (2014)
PubMed: 25551756 DOI: 10.1371/journal.pone.0116125
Mardinoglu A et al., Defining the Human Adipose Tissue Proteome To Reveal Metabolic Alterations in Obesity. J Proteome Res. (2014)
PubMed: 25219818 DOI: 10.1021/pr500586e
Kampf C et al., Defining the human gallbladder proteome by transcriptomics and affinity proteomics. Proteomics. (2014)
PubMed: 25175928 DOI: 10.1002/pmic.201400201
Lindskog C et al., The lung-specific proteome defined by integration of transcriptomics and antibody-based profiling. FASEB J. (2014)
PubMed: 25169055 DOI: 10.1096/fj.14-254862
Gremel G et al., The human gastrointestinal tract-specific transcriptome and proteome as defined by RNA sequencing and antibody-based profiling. J Gastroenterol. (2014)
PubMed: 24789573 DOI: 10.1007/s00535-014-0958-7
Kampf C et al., The human liver-specific proteome defined by transcriptomics and antibody-based profiling. FASEB J. (2014)
PubMed: 24648543 DOI: 10.1096/fj.14-250555
Djureinovic D et al., The human testis-specific proteome defined by transcriptomics and antibody-based profiling. Mol Hum Reprod. (2014)
PubMed: 24598113 DOI: 10.1093/molehr/gau018
Fagerberg L et al., Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics. (2014)
PubMed: 24309898 DOI: 10.1074/mcp.M113.035600
Danielsson A et al., The human pancreas proteome defined by transcriptomics and antibody-based profiling. PLoS One. (2014)
PubMed: 25546435 DOI: 10.1371/journal.pone.0115421
Microscopical images of normal tissue - Tissue Dictionary (Human Protein Atlas)
GTEx Portal
Fantom
UniProt
Allen Brain Atlas