The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.
The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.
The overarching goal of TCGA is to improve our ability to diagnose, treat and prevent cancer. To achieve this goal in a scientifically rigorous manner, the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI)] used a phased-in strategy to launch TCGA. A pilot project developed and tested the research framework needed to systematically explore the entire spectrum of genomic changes involved in more than 20 types of human cancer.
TCGA began as a three-year pilot in 2006 with an investment of $50 million each from the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI). The TCGA pilot project confirmed that an atlas of changes could be created for specific cancer types. It also showed that a national network of research and technology teams working on distinct but related projects could pool the results of their efforts, create an economy of scale and develop an infrastructure for making the data publicly accessible. Importantly, it proved that making the data freely available would enable researchers anywhere around the world to make and validate important discoveries. The success of the pilot with three initial tumor types led the National Institutes of Health to commit major resources to TCGA to collect and characterize more than 20 additional tumor types.
Currently, more than 20 types of cancer having the most significant impact on individual and public health are being analyzed and compared with normal tissue on multiple levels (nucleotide variation, gene copy number variation, gene expression level, and others) using state-of-the-art high throughput techniques, by a large consortium of federal laboratories, universities and institutes. Acquisition and analysis of DNA and RNA sequence data is major effort of the TCGA, which will increase as more tumor types come under scrutiny and as biotechnology continues to advance.
TCGA is a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), two of the 27 Institutes and Centers of the National Institutes of Health (NIH), U.S. Department of Health and Human Services.
TCGA is coordinated by a project team comprised of individuals from both the National Cancer Institute and the National Human Genome Research Institute. An External Scientific Committee whose membership includes participant advocates, senior scientists and clinicians with relevant expertise in cancer, genomics and ethics advise TCGA.
See TCGA People and Contacts for more details on project personnel and contact information.
Data Generation and Data Flow
The following steps, illustrated in the accompanying figure, summarize data flow through the TCGA pipeline:
- Tissue samples along with clinical data are collected by Tissue Source Sites (TSS) and sent to the Biospecimen Core Resources (BCRs).
- The BCRs submit clinical data and metadata to the Data Coordinating Center (DCC) and analytes to the Genome Characterization Centers (GCCs) and Sequencing Centers (GSCs), where mutation calls are generated and then submitted to the DCC.
- GSCs submit trace files, sequences and alignment mappings to the Cancer Genomics Hub (CGHub) as well.
- Data submitted to the DCC and CGHub are made available to the research community and Genome Data Analysis Centers (GDACs).
- Analysis pipelines and data results produced by GDACs are served to the research community via the DCC.
Participants are asked to donate a portion of tumor tissue that has been removed as part of their cancer treatment along with a sample of normal tissue, usually blood. Tissue and fluid used for analysis are called biospecimens.
Biospecimen samples used for genomic research need to meet a stringent set of criteria so that the genetic material (DNA and RNA) removed from them can be used by advanced genomic analysis and sequencing technologies.
The BCR laboratories process samples to ensure they meet the TCGA biospecimen criteria and prepare them for analysis. Part of the process includes coding the biospecimens to remove any information that might connect a sample with a participant’s private information.
Research and Discovery
TCGA researchers analyze tumor and normal tissue from hundreds of participants for each cancer selected for study. This provides the statistical power needed to produce a complete genomic profile of each cancer, which is crucial to identifying those genomic changes that offer the greatest opportunities for therapeutic development.
GCCs analyze many of the genetic changes involved in cancer including how the genome is rearranged or how gene expression changes in tumors compared to normal cells.
High-throughput GSCs identify the changes in DNA sequence associated with specific types of cancer. Newly-developing sequencing technologies are used to increase the scope of DNA sequencing efforts on TCGA samples.
Immense amounts of data from characterization and sequencing platforms are integrated across thousands of samples. The GDACs provide new information-processing, analysis and visualization tools to the entire research community to facilitate broader use of TCGA data.
The information that is generated by the TCGA Research Network is centrally managed at the DCC and entered into public databases as it becomes available, allowing scientists to continually access the information.
Scientists search, download and analyze datasets generated by the TCGA Research Network through the TCGA Data Portal. Essentially, the Data Portal contains the genetic profiles of specific cancer types.
Community Research and Discovery
TCGA's comprehensive and robust data are enabling research by the broad cancer community that could not be possible without it. TCGA data will continue to have a multiplier effect on the scope and quality of research.
The ultimate goal of TCGA is to enable the cancer community to find new ways to better care for patients and significantly reduce the suffering and death due to cancer.
TCGA Data Portal
All TCGA data is available through the TCGA Portal except for lower levels of sequence data (trace files, sequence and aligned reads). Trace files produced by older sequencing techologies are stored in NCBI's Trace Archive, while sequence reads and alignments from newer sequencing technologies are available at CGHub. Higher level sequence data (variation calls and abundance measures) are available at the TCGA Portal.