About Sequence-Based Data
TCGA sequence data is created by data-generating centers using various platforms targeting, for example, the whole genome, the exome and micro-RNA (miRNA). These centers use the sequence data to identify variants in genes or the genome by comparing tumor-sample results to normal-sample results and a reference. Some variants include:
- germline and somatic mutations
- single nucleotide variants/polymorphisms
- insertions and deletions (collectively known as in-dels)
- copy number variations
- translocations
- inversions
In addition to identifying variants, RNASeq and miRNASeq produce quantification data, for example gene or miRNA expression. For more information, see RNASeq.
Data File Submissions
The data-generating centers that use sequencing platforms generate the following sequence-derived data files:
CGHub Deposit Site
File Type | File Suffix | Data Level | Description |
|---|---|---|---|
Sequence Read Files | various or fastq |
| Sequence read data in their native platform formats (e.g. AB SOLiD, Illumina) or FASTQ format. |
bam | 1 | A Binary Alignment/Map (BAM) file is the compressed binary version of the Sequence Alignment/Map (SAM), a compact and indexable representation of nucleotide sequence alignments. | |
scf | 1 | Sequence trace files contain the raw data output from automated sequencing instruments. |
DCC Deposit Site
File Type | File Suffix | Data Level | Description |
|---|---|---|---|
wig | 2 | The wiggle (WIG) format describes dense, continuous data such as sequence coverage, GC percent, and probability scores. | |
maf | 2 or 3 | A Mutation Annotation Format (MAF) is a tab-delimited file containing somatic and/or germline mutation annotations. MAF files containing any germline mutation annotations are kept in the controlled access portion of the Data Portal, MAF files containing only somatic mutations are kept in the open access portion of the Data Portal. MAF files are considered Level 2 files. | |
vcf | 2 or 3 | The Variant Call Format (VCF) is a standardized format for storing and reporting genomic sequence variations. | |
tr | 1 | The page Trace ID-to Sample Relationship File does not exist. | |
1
Verbose Coverage Files are no longer accepted by TCGA, but are maintained as a historic data type. The VCF extension now refers to Variant Call Format files.
| vcf | 2 | A verbose coverage file (VCF) provides sequence depth at a mutation locus described in a MAF file. |
quantification.txt | 3 | A quantification file provides calculated values for a particular data type based on sequence data. The current data types and quantification formats are based on RNA sequencing results. |
Mapping Sequence-Based Data
Currently, there is no equivalent to an SDRF file for sequence-based (GSC) data. GSC Mutation Annotation Format (MAF) files does provide the relationship between aliquot barcodes and associated called variants.



