In 2006, the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) began a cancer genomics program which culminated in the creation of “The Cancer Genome Atlas” (TCGA). The TCGA contains a lot of “omics”: genomics, epigenomics, transcriptomics, and proteomics. So, let’s start with learning what each of these omics means…
- Genomics: Genomics is the study of all the genes in the body, also known as the genome. Extensive studies of the genome decipher how genes interact with each other and their environment
- Epigenomics: The epigenome consists of chemical compounds which regulate genes by binding to DNA. Epigenomics focuses on understanding the specific functions and effects of these inducible alterations.
- Transcriptomics: Genes provide instructions for generating proteins necessary to maintain all the cells throughout the body. Transcriptomics is the study of how these instructions, or transcripts, appear within a cell.
- Proteomics: The proteins produced or altered by an individual’s genes make up the proteome. These proteins are responsible for almost all cellular functions, and studying the precise role they play is known as proteomics.
TCGA began with a three-year, $100 million Federal investment to obtain a comprehensive understanding of the molecular basis of cancer. The pilot program aimed to characterize the molecular makeup of lung, brain (specifically glioblastoma), and ovarian cancer. This program's overarching goal was to assess the feasibility of gene mapping to detect genetic alterations leading to cancer. Seven institutions received awards to establish Cancer Genome Characterization Centers (CGCCs), which utilized advanced genome analysis techniques to characterize the genomes associated with these cancers. Comprehensive analysis is performed to identify genetic alterations in the volunteered specimens.
The generation of TCGA involved major collaborations among basic and clinical researchers, doctors, and nurses. Arguably, the most critical contributors to TCGA are the cancer patients who volunteer to participate. The process begins when a cancer patient consents to donate specimens, including tumor tissue and blood. Researchers at a centralized facility process the specimens and provide the genetic material to the CGCCs. Centralized processing limits sample to sample variation to ensure the highest quality data. Finally, all genetic information generated by TCGA is made available to researchers and doctors worldwide.
In 2016, the NCI launched the Genomic Data Commons (GDC) to house the information generated from TCGA to encourage data sharing, advance cancer research, and assist with the diagnosis and treatment of cancer. By 2018, TCGA had expanded to include 33 cancer types, including ten rare cancers. In total, 20 collaborating institutes participated in developing TCGA. Almost 11,000 patients donated specimens to this landmark project, which generated over 2.5 petabytes of data. Cancers chosen for TCGA analysis met specific criteria such as poor prognosis and overall public health impact. A complete list of TCGA Cancers Selected for Study was originally published by the National Cancer Institute.
Sources: Microbiol Mol Biol Rev, NCI Press Office, TCGA Cancers Selected for Study