The human brain proteome

The function of the brain, defined as the central nervous system, is to receive, process and execute the coordinated higher functions of perception, motion and cognition that signify human life. The cellular components of the underlying and highly complex network of transmitted signals include neurons and supportive glial cells. Brain tissue includes different cells types as well as the space between the cell bodies, often referred to as neuropil, the meshwork of exons, dendrites, synapses and extracellular matrix that embed the central nervous system cells.

Protein-coding genes are classified based on RNA expression in brain from two different perspectives:

  1. A whole body perspective, comparing gene expression in the brain to peripheral organ and tissue types
  2. A brain-centric point of view comparing gene expression in the various regions of the brain


Brain expression is compared to other organs and tissues by using the highest expression value of all brain regions. For the regional classification the brain is divided into 10 anatomically defined regions, color coded in Figure 1. The transcriptome analysis shows that 82% (n=16227) of all human proteins (n=19670) are expressed in the brain (based on 10 brain regions, spinal cord and corpus callosum). Regional classification was based on 15157 genes are detected in the brain and included in all used external datasets. Out of the genes with regional expression classification, 1059 are categorized as genes with a regionally eleveted expression. Regional specific summary pages including lists of regional elavated genes can be found here: olfactory bulb, cerebral cortex, hippocampal formation, amygdala, basal ganglia, hypothalamus, thalamus, midbrain, pons and medulla as well as cerebellum.

Figure 1. Midsagittal schematic drawing of the different regions of the human brain, color coded according to the 10 regions.

In addition to the basic regional distribution of gene expression in the human brain, a more detailed overview of gene expression in the prefrontal cortex is available. This stand-alone dataset is based on RNAseq analysis of 165 samples from 3 male and 3 female donors providing a detailed overview of protein expression in 17 subregions of the prefrontal cortex and 3 reference cortical regions further described here. Gene expression in each subregion can be explored on the gene summary page.

The brain elevated genes, comparing brain to other organs and tissue types

Out of the 16227 genes detected above cut of in the brain, 2587 genes have an elevated expression in the brain compared to other tissue types. Genes with elevated expression levels in the brain are defined by the tissue specificity category, while the tissue distribution category highlights genes based on whether the genes is detected or not (above cut off NX≥1). The subdivided expression categories are summarized in pie charts (Figure 2A and B) and Table 1. An analysis of the proteins with elevated expression in the brain show various patterns of expression and the protein is localized to different neurons and glial cells, as well as in neuropil.


  • 488 brain enriched genes
  • Most of the enriched genes encode proteins involved in transport and signaling
  • 2587 genes defined as elevated in the brain
  • 33 genes are only detected in the brain

A

B

Figure 2. (A) The distribution of all genes across the five categories based on transcript abundance in brain as well as in all other tissues. (B) The distribution of all genes across the six categories, based on transcript detection (NX≥1) in brain as well as in all other tissues.

Transcriptome analysis of the brain can be visualized with regard to abundance and distribution of transcribed mRNA molecules (Figure 2). Abundance illustrates the number of genes with elevated or non-elevated expression in the brain compared to other tissues. 2587 genes show some level of elevated expression in the brain compared to other tissues. Elevated expression in brain compared to other tissue types is divided into three different categories;

  • Tissue enriched: At least four-fold higher mRNA level in brain compared to any other tissues.
  • Group enriched: At least four-fold higher average mRNA level in a group of 2-5 tissues compared to any other tissue.
  • Tissue enhanced: At least four-fold higher mRNA level in brain compared to the average level in all other tissues.

Distribution, on the other hand, visualizes how many genes that have, or do not have, detectable levels above cut off (NX≥1) of transcribed mRNA molecules in the brain compared to other tissues.

  • Detected in single: Detected in a single tissue
  • Detected in some: Detected in more than one but less than one third of tissues
  • Detected in many: Detected in at least a third but not all tissues
  • Detected in all: Detected in all tissues

33 genes expressed in the brain are selectively detected in the brain compared to all other tissues, out of which the majority (n=19) are also classified as enriched in the brain, remaining 14 genes are classified as enhanced. The number of genes in the individual category is shown in Table 1. In Table 2, the 12 genes with the highest level of tissue specificity among 488 enriched genes are listed. The list of tissue enriched genes are well in-line with the function of the brain.

Table 1. Number of genes in the subdivided categories of elevated expression in the brain (based on transcript abundance) and the tissue distribution (based on expression above cut off) in the brain.

Distribution in the 37 tissues
Detected in singleDetected in someDetected in manyDetected in all Total
Specificity
Tissue enriched 1913531321 488
Group enriched 022226113 496
Tissue enhanced 14328956305 1603
Total 336851530339 2587

Table 2. The 12 genes with the highest level of enriched expression in the brain and the tissue distribution category for the gene. "mRNA (tissue)" shows the transcript level as NX values, TS-score (Tissue Specificity score) corresponds to the score calculated as the fold change to the second highest tissue.

Gene Description Tissue distribution mRNA (tissue) Tissue specificity score
GFAP glial fibrillary acidic protein Detected in many 595.5 61
TLX3 T cell leukemia homeobox 3 Detected in single 46.6 59
NEUROD2 neuronal differentiation 2 Detected in single 51.4 54
NEUROD6 neuronal differentiation 6 Detected in single 30.6 51
NCAN neurocan Detected in many 88.4 48
MOG myelin oligodendrocyte glycoprotein Detected in many 106.4 47
HPCA hippocalcin Detected in many 118.9 46
AVP arginine vasopressin Detected in some 270.1 44
BARHL1 BarH like homeobox 1 Detected in single 40.1 44
MBP myelin basic protein Detected in many 1297.7 43
OPALIN oligodendrocytic myelin paranodal and inner loop protein Detected in many 69.7 43
MEPE matrix extracellular phosphoglycoprotein Detected in single 34.2 43

Protein localization of genes with elevated expression in the brain compared to other tissues

In-depth analysis of the elevated genes in the brain, using antibody-based protein profiling, allowed us to understand the distribution of the brain specific genes and their protein location. Proteins expressed by the different cells types of the brain were identified among the genes with elevated expression.

Proteins specifically detected in neurons

Neurons are functional entities in the brain and based on morphology and neurotransmitter phenotype originally divided into two main classes, excitatory, glutamatergic pyramidal projection neurons (~75%) and inhibitory, mostly GABAergic interneurons (~25%). The protein ELAV-like protein 3 (ELAVL3) is expressed in all neurons. On the other hand, glutamate decarboxylase 1 (GAD1) is an essential enzyme in the biosynthesis of GABA and known to be expressed in the majority of cortical GABAergic interneurons. Protocadherin alpha-1 (PCDHA1) is expressed in cerebral cortex and can be detected in a few sparsely distributed interneuron-like neurons.


ELAVL3

GAD1

PCDHA1

Detailed immunohistochemical analysis of proteins with known molecular functions shows that many brain-elevated proteins are involved in synaptic signaling, such as docking of synaptic vesicles (e.g. synaptophysin (SYP)). Also various known post-synaptic proteins including the GABA B receptor subunit 2 (GABBR2) and proteins involved in organizing and maintaining synaptic connections, such as cell adhesion molecule 2 (CADM2) are encountered. These data underline that events associated with synaptic transmission require specialized proteins, most often with an enriched expression level in the brain compared to peripheral tissue types.


SYP

GABBR2

CADM2

Proteins specifically detected in glial cells

Glial cells can generally be subdivided into astrocytes, oligodendrocytes and microglia based on morphology and function, and many of the proteins located to glial cells in our analysis have an astrocyte-like staining pattern present in both grey and white matter structures. However, variation in distribution, morphology and cell density is observed. The well-known astrocyte marker GFAP as well as the unexplored gene FAM19A1 are detected in astrocytes of both the white and grey matter. In contrast, the water transporter AQP4 is mainly detected in the grey matter and reveals a neuropil-like staining pattern due to the localization of the protein in numerous glia endfeet.


GFAP

FAM19A1

AQP4

Several genes expressed in oligodendrocytes are involved in myelination, such as the compact myelin proteins myelin basic protein (MBP) and proteolipid protein 1 (PLP1). In contrast to the oligodendrocyte transcription factor OLIG2, none of the other investigated myelin sheet components are brain specific. MBP and PLP1 are enriched but this is mainly due to the sample composition containing 25% densely myelinated white matter. Expression above cut off are found in several peripheral tissue types and immunohistochemical analysis reveals that this expression mainly represents Schwann cells in peripheral nerves.


MBP

PLP1

OLIG2

The third type of glial cells 'populating' the brain is microglia. These cells are derived from hematopoietic stem cells invading the brain during embryonic development or macrophages that enter the brain from the bloodstream later in life. The well-known microglia genes integrin alpha M chain (ITGAM) and allograft inflammatory factor 1 (AIF1) are not specific nor enriched in the brain but are also expressed, for example, in cells populating the lymph node and bone marrow, the main site of hematopoiesis. Based on our immunohistochemistry analysis we can only identify one microglia gene, purinoceptor P2RY12, enhanced in brain tissue, with low expression in lymph node and bone marrow. These data show the close relationship of microglia and hematopoietic cells reflecting the common developmental origin of microglia and macrophages.


ITGAM

AIF1

P2RY12


Regional expression within the brain

The regional organization of the brain anatomy separates the brain into regions, sub regions, nuclei and layers of specialized cells, enabling the specific function of each individual region. Transcriptomic data from the different regions facilitates additional classification of the expression within brain. Identical strategy, as used for the tissue type classification, was applied to the regional data resulting in regionally elevated genes (separated into regionally enriched, group enriched and regionally enhanced).

  • 1059 genes classified as regionally elevated
  • 520 genes are brain elevated as well as regionally elevated
  • Cerebellum has the most regionally enriched genes (n=214)
  • 483 regionally elevated genes are elevated in other tissues than brain

Figure 3. An interactive network plot of the regionally enriched and group enriched genes connected to their respective enriched region (black circles). Red nodes represent the number of regionally enriched genes and orange nodes represent the number of genes that are group enriched. The sizes of the red and orange nodes are related to the number of genes displayed within the node. Each node is clickable and results in a list of all enriched genes connected to the highlighted edges. The network is limited to group enriched genes in combinations up to 4 genes and 5 regions, but the resulting lists show the complete set of group enriched genes in the particular region.

For more information and examples about the regionally elevated expression in the different regions, please visit the individual summary pages; olfactory bulb, cerebral cortex, hippocampal formation, amygdala, basal ganglia, hypothalamus, thalamus, midbrain, pons and medulla as well as cerebellum.

Table 3, The 10 regions of the brain and numbers of genes detected above cut off, indicating expression in that brain region, as well as number of genes classified as elevated in each region compared to the others based on transcript abundance in the individual regions (max NX of sub regions for that specific region is used as representative). Same classification rules are used for the regional classification as the tissue specificity classification based on tissue types

Table 4. The 12 genes with the highest level of regional enriched expression within the brain and the regional distribution category. "mRNA (region)" shows the transcript level as NX values, RS-score (Regional Specificity score) corresponds to the score calculated as the fold change to the second highest region.

Gene Description Predicted location RS-score
TGM4 Transglutaminase 4 Intracellular 254
HSD3B2 Hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid delta-isomerase 2 Intracellular,Membrane 198
CYP17A1 Cytochrome P450 family 17 subfamily A member 1 Intracellular 103
PHOX2B Paired like homeobox 2b Intracellular 99
PHOX2A Paired like homeobox 2a Intracellular 96
NPVF Neuropeptide VF precursor Secreted 95
HOXB8 Homeobox B8 Intracellular 82
SPINK6 Serine peptidase inhibitor, Kazal type 6 Secreted 78
RLN2 Relaxin 2 Intracellular,Secreted 64
OXT Oxytocin/neurophysin I prepropeptide Secreted 59
UPK3B Uroplakin 3B Intracellular,Membrane 58
HDC Histidine decarboxylase Intracellular 52


Proteins specifically expressed in one region of the brain


PNOC - Cerebral cortex

ADORA2A - Caudate (basal ganglia)

HDC - Hypothalamus


SLC6A3 - Substantia nigra (midbrain)

TPH2 - Dorsal raphe (midbrain)

ARHGEF33 - Cerebellum


Comparing tissue classification with regional expression in the brain

The majority of brain elevated genes are classified as low regional specificity (n=1776) and 520 genes are brain elevated as well as regionally elevated. Among the genes classified as brain elevated and low regional specificity several glial specific proteins are found, for example GFAP and AQP4 as well as MBP. In contrast, neuronal proteins are more often found among the regionally elevated genes, such as ADORA2A, AVP and ARHGEF33. Interestingly, there are many brain interesting proteins classified as elevated in other tissues than brain, such as the cerebellum elevated ANK1 elevated in skeletal muscle and TFAP2B elevated in epididymis, as well as the more regionally general CRYAB localized to oligodendrocytes that is elevated in heart and skeletal muscle. This highlights the importance of mapping expression and localization from multiple perspectives to better understand the proteins important for brain functions.


ANK1

TFAP2B

CRYAB

Table 5. Overlap between tissue classification, indicating elevated expression in the brain or not, with the regional specificity within the brain. (The regional classification of human brain expression is limited by available external data, thus do not cover all human protein-coding genes.

Regionally elevated Low regional specificity Missing regional classification Total
Elevated in brain 520 1776 291 2587
Elevated in other tissue but expressed in brain 483 4295 520 5298
Low tissue specificity 56 8027 302 8385
Total 1059 14098 1113 16270


Additional structures considered part of the brain

The regional classification is, as mentioned above, based on the 10 main regions of the brain. While the brain as a tissue type also include spinal cord and corpus callosum, when compared to the peripheral tissue types for tissue classification.

Spinal cord

The spinal cord is an elongation of the brain, with the composition of white and gray matter. The dorsal part is involved in processing of sensory information received by sensory neurons located in the dorsal root ganglia. The ventral part contains motor neurons. The spinal cord is organized in cervical (C1-C7), thoracic (T1-T12) and lumbar (L1-L5) segments. Each segment receives sensory information from and generate motor output to its corresponding body segment.


Corpus callosum

Corpus callosum is the largest nerve tract in the brain, providing communication between left and right hemispheres. This flat bundle situated beneath the cortex contains roughly 200-300 million axonal projections. The neuronal fibers vary in density and the amount of myelination, reflecting their functionality. The structure is divided into sub regions based on the target area it’s connecting to. Studying brains of patients with severed corpus callosum, has brought answers on how each isolated hemisphere works.

Figure 4. Schematic drawing of the anatomical location of corpus callosum in the human brain, in dark gray.


Gene expression shared between brain and other tissues

There are 496 group enriched genes expressed in the brain. Group enriched genes are defined as genes showing a 4-fold higher average level of mRNA expression in a group of 2-5 tissues, including brain, compared to all other tissues.

In order to illustrate the relation of brain tissue to other tissue types, a network plot was generated, displaying the number of genes shared between different tissue types. The common origin of neuroectoderm is a plausible reason for the relatively high number of genes connecting brain with adrenal gland and pancreas. However, a clear connection for the large number of genes shared between testis and brain could not be revealed, neither by gene ontology analysis or immunohistochemical analysis and further investigations are needed. The network plot reveals that most group enriched genes are shared with the testis (n=131). The large number of group enriched genes related to brain and skeletal muscle is possibly due to shared signaling functions. The group enriched genes shared with pituitary gland is expected since half of the pituitary gland (posterior lobe) originates from the brain and both neuronal and glial cells are located in the gland. Several group enriched genes are shared with the fallopian tube, mainly related to ciliated cells that are found in the ependymal cells of the ventricle walls.

Figure 4. An interactive network plot of the brain enriched and group enriched genes connected to their respective enriched tissues (grey circles). Red nodes represent the number of brain enriched genes and orange nodes represent the number of genes that are group enriched. The sizes of the red and orange nodes are related to the number of genes displayed within the node. Each node is clickable and results in a list of all enriched genes connected to the highlighted edges. The network is limited to group enriched genes in combinations of up to 3 tissues, but the resulting lists show the complete set of group enriched genes in the particular tissue.


SH3GL3 is implicated in neuronal endocytosis, and group enriched in brain and testis. Immunohistochemical analysis shows the encoded protein to be expressed in sperm cells and neuropil.


SH3GL3 - testis

SH3GL3 - cerebral cortex

AQP4, is also group enriched and show highest expression value in brain and lung.


AQP4 - cerebral cortex

AQP4 - lung

SPTB, is group enriched and show high expression value in brain and skeletal muscle, located to the membrane of neurons as well as skeletal muscle.
SPTB - skeletal muscle

SPTB - cerebellum

Ciliated cells in fallopian tube and respiratory epithelium share several proteins with the ciliated ependymal cells in the brain, resulting in several genes classified as group enriched, such as FOXJ1 and RSPH1.


FOXJ1 - caudate

FOXJ1 - fallopian tube

FOXJ1 - bronchus


RSPH1 - caudate

RSPH1 - fallopian tube

RSPH1 - bronchus


Brain function and histology

The nervous system represents the major communication network and consists of the central nervous system (CNS) and peripheral nervous system (PNS). The intracranial cerebrum and cerebellum together with the spinal cord constitutes the CNS. The brain is covered by layers of membranes, the meninges, and submerged in cerebrospinal fluid, which also fills the intracerebral ventricles. The brain can grossly be divided into different neuroanatomical functional regions such as the frontal, parietal, temporal, occipital lobes and central gray matter structures. Anatomically and histologically the brain can further be stratified into the cerebral cortex representing the outermost gray matter overlying white matter and the innermost deep gray matter components. The hippocampus, containing the neuron rich dentate fascia, is closely associated with the cerebral cortex, and is located 20in the medial temporal lobe. The cerebral cortex incorporates neurons (nerve cells) and glial cells (supportive cells), whereas the white matter incorporates primarily oligodendrocytes and axons from cortical and subcortical projection neurons.



Figure 5. Schematic image of a normal human brain that visualizes brain structures in a sagittal plane, showing the right half of the brain (upper), and in a frontal plane, showing the posterior half of the brain (lower). The cerebral cortex represents the outer layers of the brain and consists of neuron-rich grey matter.

The brain is composed of neurons embedded in a framework of glial cells (astrocytes and oligodendrocytes) as well as microglia and blood vessels. In addition to the cell bodies that can be defined in the microscope, cell processes from neurons and glial cells form a synaptically rich "background substance" often referred to as neuropil.

The neurons are a morphologically and functionally heterogeneous family of cells that can transmit information through chemical and electrical signaling. Neurons vary in size from the small round cells that populate the internal granular layer of the cerebellum to the large pyramidal neurons of the primary motor cortex and the Purkinje cells of the cerebellum. Astrocytes represent the major glial cell type in the brain and are characterized by their cellular cytoplasmic processes reaching both synapses and capillary walls. The astrocyte is a star shaped cell involved in the maintenance of the microenvironment surrounding neurons and also important for the blood-brain barrier function. Oligodendrocytes are the main producer of myelin and are characterized by their small, rounded and lymphocyte like nuclei.

The histology of human brain including detailed images and information about the different cell types can be viewed in the Protein Atlas Histology Dictionary.


Background

Here, the protein-coding genes expressed in brain are described and characterized, together with examples of immunohistochemically stained tissue sections that visualize corresponding protein expression patterns of genes with elevated expression in brain.


Transcript profiling was based on a combination of three transcriptomics datasets (HPA, GTEx and FANTOM5), corresponding to a total of 9332 samples from 113 different human normal tissue types. The final consensus normalized expression (NX) value for each tissue type was used for classification of all genes according to the tissue specific expression into two different categories, based on specificity or distribution.


Relevant links and publications

Sjöstedt E et al., An atlas of the protein-coding genes in the human, pig, and mouse brain. Science. (2020)
PubMed: 32139519 DOI: 10.1126/science.aay5947

Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419

Yu NY et al., Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium. Nucleic Acids Res. (2015)
PubMed: 26117540 DOI: 10.1093/nar/gkv608

Sjöstedt E et al., Defining the Human Brain Proteome Using Transcriptomics and Antibody-Based Profiling with a Focus on the Cerebral Cortex. PLoS One. (2015)
PubMed: 26076492 DOI: 10.1371/journal.pone.0130028

Fagerberg L et al., Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics. (2014)
PubMed: 24309898 DOI: 10.1074/mcp.M113.035600

An anatomically comprehensive atlas of the adult human brain transcriptome

Allen brain atlas

Histology dictionary - the brain