Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Page tree
Skip to end of metadata
Go to start of metadata

A description file (DESCRIPTION.txt) is an archive description text file that provides useful background information for researchers on archive data. This includes, but is not limited to, a detailed description of procedures, protocols and calculations that were applied to produce the archive data.

A description file is packaged in an archive along with the data files it describes and is created by the data submission center that submits the archive to the DCC. One description file exists for each archive and is named as DESCRIPTION.txt.

A description file contains information that may be useful for researchers who download the archives for their research purposes. Information covered in description files often list (but is not limited to):

  • Experiment background information
  • Procedures/methods used
  • Algorithms/protocols applied
  • Useful links to papers or resources

Example

Below is an example of a description file produced by the Broad Institute:

DESCRIPTION.txt for Broad SNP 6 data
	BROAD TCGA ALGORITHM DESCRIPTION


For the latest description of the algorithms, please see the supplementary
information of the paper at:

 http://www.nature.com/nature/journal/vaop/ncurrent/suppinfo/nature07385.html


1) Invariant Set Median-Polish Values

Protocol Name:    broad.mit.edu:invariantset_medianpolish:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    2
Data File:     *.ismpolish.txt, *.ismpolish.data.txt

Invariant Set Median-Polish results are probe sets' normalized
intensity values.  Firstly, the probes' raw intensity values were
brightness corrected using Invariant Set Normalization as described in
Li and Wong et al.'s dChip paper. Then the probe sets were summarized
using a robust median, a median-polishing method described in Bolstad
et al.'s RMA paper.  Both of these steps were executed by a
GenePattern module called SNPFileCreator.


2) Allele-Specific Copy-Numbers

Protocol Name:    broad.mit.edu:copynumber_byallele:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    2
Data File:     *.copynumber.byallele.txt, *.copynumber.byallele.data.txt

Allele-specific copy numbers were estimated at each of the SNP markers
by subtracting a background term and dividing by a
scaling factor. The calculation is done in an
allele-specific manner. The background term for each allele is
estimated using the center of the birdseed cluster associated with
homozygous call of the other allele (for example, for allele A we use
the A coordinate of the center of the BB cluster). The scaling factor
is set to half the of the distance between the AA cluster and the BB
cluster along the relevant coordinate.

3) Copy-Numbers

Protocol Name:    broad.mit.edu:copy_number:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    2
Data File:     *.copynumber.txt, *.no_outlier.copynumber, after_5NN.copynumber, *.copynumber.data.txt

Raw copy numbers were estimated at each of the SNP and copy-number
(CN) markers by subtracting a background term and dividing by a
scaling factor. The total copy at SNP markers was calculated by summing the allele-specific values.
For CN probes we built a model based on an X-dosage experiment
which estimates the background and scaling factor as a function of the
median intensity of the probe across normal samples.
Finally, we divide the total copy number be the average of all normals and multiply by 2.

4) Segmentation

Protocol Name:    broad.mit.edu:segmented_cna:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    3
Data File:     *.seg.txt, *.seg.data.txt

CBS segmentation was used to segment the data after removal of outliers.

5) Birdseed Genotypes

Protocol Name:    broad.mit.edu:birdseed_genotype:Genome_Wide_SNP_6:01
Link:        https://www.affymetrix.com/support/developer/powertools/index.affx
Data Level:    2
Data File:     *.birdseed.txt, *.birdseed.data.txt

Birdseed results are genotype calls produced by the Birdseed algorithm
from the probe sets' intensity values normalized by Invariant Set
Median-Polish algorithm. Initially the normalized values of SNP probe
sets from the normals samples were passed as input to birdseed along
with the 6.0 priors file and special SNPs file.  The clusters,
confidences and calls files were generated.  The Birdseed was run
again this time using the '--clusters' option and using the SNP probe
sets from all samples with the clusters file from the previous normals
run.

6) Loss-of-Heterozygosity

Protocol Name:    broad.mit.edu:loss_of_heterozygosity:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    2
Data File:     *.loh.txt, *.loh.data.txt

We compare the genotypes of each tumor to its matched normal. For SNPs
which are heterozygous in the matched normal we flag the SNP in the
tumor as R (retention) or L (LOH). For SNPs which are homozygous in
the normal we flag the SNP in the tumor as U (uninformative) or C
(conflict -- gain og heterozygousity).

7) LOH Segmentation

Protocol Name:    broad.mit.edu:loh_segmented_cna:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    3
Data File:     *.loh.seg.txt, *.loh.seg.data.txt

TO BE SUBMITTED


8) Region of Interest

Protocol Name:    broad.mit.edu:regions_of_interest_cna:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    4
Data File:     *.ROI

We apply the GISTIC algorithm.
  • No labels