Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Skip to end of metadata
Go to start of metadata
 Array-based data (within the scope of TCGA) are characterization data produced by GCCs   using array-based platforms .

About Array-Based Data

Platforms use molecular probes or targets organized in rows and columns on an array. TCGA array-based data is created by data-generating centers using various platforms targeting, for example:

  • gene, exon, miRNA, and protein expression
  • copy number variation
  • single nucleotide polymorphisms (SNPs)
  • loss of heterozygosity (LOH)
  • DNA methylation

Data File Submissions

The data-generating centers that use array based platforms generate the following characterization data files:

Data Type

File Extension

Data Level

Description

Copy Number Results

.CBS.txt

3

 

Copy Number Results

.data.txt

2

 

Copy Number Results

.hla.CBS.Rdata

3

 

Copy Number Results

.hla.CBS.txt

3

 

Copy Number Results

.mat

2

 

Copy Number Results

.tsv

2

 

Copy Number Results

.tsv

3

 

Copy Number Results

.txt

1

 

Copy Number Results

.txt

3

 

Copy Number Results

.txt

4

 

DNA Methylation

.adf.txt

1

 

DNA Methylation

.beta-value.txt

2

 

DNA Methylation

.bk-subtract.data.txt

2

 

DNA Methylation

.data.txt

2

 

DNA Methylation

.detection-p-value.txt

2

 

DNA Methylation

.nbk-subtract.data.txt

2

 

DNA Methylation

.seg.xls

3

 

DNA Methylation

.txt

1

 

DNA Methylation

.txt

2

 

DNA Methylation

.txt

3

 

Expression-Exon

.CEL

1

 

Expression-Exon

.FIRMA.txt

3

 

Expression-Exon

.ROI.txt

4

 

Expression-Exon

.data.txt

2

 

Expression-Exon

.exon.txt

2

 

Expression-Exon

.gene.txt

2

 

Expression-Exon

.gene.txt

3

 

Expression-Genes

.CEL

1

 

Expression-Genes

.CEL.README

1

 

Expression-Genes

.data.txt

2

 

Expression-Genes

.data.txt

3

 

Expression-Genes

.gene.matrix.txt

3

 

Expression-Genes

.probe.matrix.txt

2

 

Expression-Genes

.roi.txt

4

 

Expression-Genes

.tsv

2

 

Expression-Genes

.tsv

3

 

Expression-Genes

.txt

1

 

Expression-Genes

.txt

2

 

Expression-Genes

.txt

3

 

Expression-miRNA

.data.txt

2

 

Expression-miRNA

.data.txt

3

 

Expression-miRNA

.txt

1

 

Expression-miRNA

.txt

2

 

Quantification-Exon

.trimmed.annotated.exon.quantification.txt

3

 

Quantification-Exon

.trimmed.annotated.gene.quantification.txt

3

 

Quantification-Exon

.trimmed.annotated.spljxn.quantification.txt

3

 

SNP

.B_Allele_Freq.txt

2

 

SNP

.B_allele_freq.txt

2

 

SNP

.CEL

1

 

SNP

.Delta_B_Allele_Freq.txt

2

 

SNP

.Genotypes.txt

2

 

SNP

.Normal_LogR.txt

2

 

SNP

.Paired_LogR.txt

2

 

SNP

.Unpaired_LogR.txt

2

 

SNP

.XandYintensity.txt

1

 

SNP

.XandYintensity.txt

2

 

SNP

.birdseed.data.txt

2

 

SNP

.birdseed.txt

2

 

SNP

.byallele.copynumber.data.txt

2

 

SNP

.cbs.seg.txt

3

 

SNP

.cel

1

 

SNP

.copynumber.byallele.data.txt

2

 

SNP

.copynumber.byallele.txt

2

 

SNP

.copynumber.data.txt

2

 

SNP

.copynumber.txt

2

 

SNP

.data.txt

4

 

SNP

.gistic.roi.txt

4

 

SNP

.idat

1

 

SNP

.ismpolish.data.txt

2

 

SNP

.ismpolish.txt

2

 

SNP

.loh.data.txt

2

 

SNP

.loh.txt

2

 

SNP

.loh.txt

3

 

SNP

.maf

2

 

SNP

.no_outlier.copynumber.data.txt

2

 

SNP

.no_outlier.copynumber.txt

2

 

SNP

.raw.copynumber.data.txt

2

 

SNP

.roi.txt

4

 

SNP

.seg.data.txt

3

 

SNP

.seg.txt

3

 

SNP

.segnormal.txt

3

 

SNP

.txt

2

 

SNP

.txt

3

 

 
 

About Raw and Processed Data Files

 

Array analysis software generates files that contain the raw, or unprocessed, data from the assay. Each type of analysis software generates these array data files in its unique format. For example, the analysis software from Affymetrix generates raw files in its native .cel file format.

Raw data can be normalized during the course of an experiment. Normalized data, and other data that results from processing are known as derived array data; the files that contain these data are derived array data files. Subsequent downstream processing produces more derived data, and therefore more derived array data files.

Files that contain summary data from multiple samples are array data matrix files. The matrix file formats are dictated by the MAGE-TAB specification. Array data matrix files contain summary data from raw files from multiple samples. Derived data matrix files contain summary data from processed files from multiple samples.

Raw and processed data files can be ASCII or binary files. Alternatively, data may be provided in the MAGE-TAB Data Matrix format. Data file names are listed in the File columns (for example, Array Data File, Image File) in an SDRF file.

Mapping Array-Based Data

Characterization centers' experiments produce many different data files that have particular data types and data levels. Mapping between aliquot barcodes and associated assay data files involves MAGE-TAB SDRFs (Sample Data Relationship Files).

The MAGE-TAB SDRF provides the mapping between aliquot barcodes and assay result files, and indicate assay result file data types and data levels.

The SDRF file is like a database describing the relationships between samples and their results. For example, in the following table, the Extract Name column contains the BCR aliquot barcode for each sample and the Derived Array Data Matrix File column lists the result file associated with each barcode. In fact, any file listed in the SDRF in the same row as a barcode is associated with that aliquot's assay result.

Extract Name (Sample ID)

Derived Array Data Matrix File (MAGE-TAB SDRF)

TCGA-02-0001-01C-01D-00182-01

broad.mit.edu_GBM.Genome_Wide_SNP_6.1.ismpolish.data.txt

TCGA-02-0001-10A-01D-00182-01

broad.mit.edu_GBM.Genome_Wide_SNP_6.1.ismpolish.data.txt

TCGA-02-0002-01A-01D-00182-01

broad.mit.edu_GBM.Genome_Wide_SNP_6.1.ismpolish.data.txt

TCGA-02-0002-10A-01D-00182-01

broad.mit.edu_GBM.Genome_Wide_SNP_6.1.ismpolish.data.txt

This example from a MAGE-TAB SDRF demonstrates Data Levels 1-2 Sample ID-to-Result File mapping for CGCC data

Note

One-to-many and many-to-many relationships between Extract Names and [file type]File columns can occur. Usually, however, the relationships are one-to-one or many-to-one. The preceding table is an example of a many-to-one relationship: data for multiple aliquots are found in a single Derived Array Data Matrix File.

  • No labels