Genome Sequencing Centers (GSCs) perform large-scale DNA sequencing using the latest sequencing technologies. Supported by the NHGRI large-scale sequencing program, the GSCs generate the enormous volume of data required by TCGA, while continually improving existing technologies and methods to expand the frontier of what can be achieved in cancer genome sequencing.
Two DNA samples from every TCGA cancer case – one from the tumor specimen and the second from either blood or non-malignant tissue – are sent from a TCGA Biospecimen Core Resource site to a GSC. The non-tumor DNA serves as a control to confirm that mutations discovered in the tumor DNA are unique to the tumor and not normal genetic variations within the individual. All samples are analyzed by whole exome sequencing using second-generation sequencing instruments. Such instruments can generate the exome data from 8 to 16 samples in a single run in 8 to 14 days.
Next, more than 10 percent of the samples from each TCGA tumor project undergo whole genome sequencing to reveal mutations that lie outside of the exome regions.
Throughout the TCGA program, the GSCs have continued to evolve their approaches, as seen in this brief timeline:
TCGA publication on the glioblastoma multiforme genome includes polymerase chain reaction/Sanger dideoxy method for sequencing of 601 target genes. At the same time, GSCs are validating protocols using new second-generation sequencing instruments.
GSCs introduce hybrid-capture procedure and second-generation sequencing instruments (Illumina and ABI SOLiD) to enable analysis of more than 6,000 known cancer-associated target genes and at production scale.
GSCs submit the first of 24 whole genome sequence (i.e., entire 6 billion nucleotides from both tumor and blood specimens from a cancer case) datasets from the glioblastoma multiforme and ovarian tumor projects.
GSCs validate whole exome capture methods, thereby expanding analysis of each tumor sample from 6,000 genes to all protein-coding and RNA genes.
GSCs sequence analytes (provided by BCRs) and analyze them for putative somatic and germline mutations. Sequencing results are sent to the Cancer Genomics Hub and mutation results are sent to the DCC.
GSC data for the TCGA project is also known as Sequenced-Based Data.
CGHub Data Submissions