Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Skip to end of metadata
Go to start of metadata

The Genome Sequencing Center (GSC) Level 1 and 2 Data are results from DNA sequencing generated from different sequencers such as Illumina Genome Analyzer, ABI SOLiD DNA System.

Contents

Synopsis of GSC Level 1 and 2 (Data Types)

Data from Genome Sequencing Centers (GSCs) represents sequencing of tumor and normal samples using next generation sequencing and subsequent alignment to a reference human genome. Level 1 data is the set of sequences aligned to the reference genome, and these binary sequence alignment maps (BAM) files are not archived at the DCC. NCI's Cancer Genomics Hub (CGHub) is the secure repository for storing, cataloging, and accessing BAM files and metadata for sequencing data.

Level 2 data are the genome variants that are represented by the tumor and normal samples. It may include MAF, VCF, and WIG files.

GSC archive naming convention

  1. Archive platform will use only the terms "automated" or  "curated" to indicate types of archives.
  2. Serial index number should not be used to differentiate different types of MAF archives.
  3. Revision number, not serial index, should be used to indicate different archive revisions.
  4. Publication MAF archives are preserved on publication freeze pages.
  5. MAF file names should include tokens “somatic” or “protected” to indicate file access level required.
  6. The term "Cont" is added in the archive platform name to indicate archives with protected access including VCF and protected MAF files.

Platform covered in GSC data

Platform Code

Platform Alias

Platform Name

Data Level

Example

HTTP Location

illuminaga_dnaseq_automated

Automated Mutation CallingIlluminaGA automated DNA sequencing

Level_2

Example

tcgafiles/ftp_auth/
distro_ftpusers/anonymous/tumor/
<disease_study>/gsc/<domain>/
illuminaga_dnaseq_automated/mutations
illuminaga_dnaseq_Cont_automatedAutomated Mutation CallingIlluminaGA automated DNA sequencing - controlledLevel_2Exampletcgafiles/ftp_auth/
distro_ftpusers/tcga4yeo/tumor/
<disease_study>/gsc/<domain>/
illuminaga_dnaseq_Cont_automated/mutations

illuminaga_dnaseq_curated

Curated Mutation CallingIlluminaGA curated DNA sequencing

Level_2

Example

tcgafiles/ftp_auth/
distro_ftpusers/anonymous/tumor/
<disease_study>/gsc/<domain>/
illuminaga_dnaseq_curated/mutations
illuminaga_dnaseq_Cont_curatedCurated Mutation CallingIlluminaGA curated DNA sequencing - controlledLevel_2Exampletcgafiles/ftp_auth/  
  distro_ftpusers/tcga4yeo/tumor/  
  <disease_study>/gsc/<domain>/  
  illuminaga_dnaseq_Cont_curated/mutations
illuminahiseq_dnaseq_automatedAutomated Mutation CallingIllumina HiSeq 2000 DNA SequencingLevel_2Exampletcgafiles/ftp_auth/
distro_ftpusers/anonymous/tumor/
<disease_study>/gsc/<domain>/
illuminahiseq_dnaseq_automated/mutations
illuminahiseq_dnaseq_Cont_automatedAutomated Mutation CallingIllumina HiSeq 2000 DNA Sequencing - controlledLevel_2Exampletcgafiles/ftp_auth/
distro_ftpusers/tcga4yeo/tumor/
<disease_study>/gsc/<domain>/
illuminahiseq_dnaseq_Cont_automated/mutations

solid_dnaseq_curated

Curated Mutation CallingSOLiD curated DNA sequencing

Level_2

Example

tcgafiles/ftp_auth/
distro_ftpusers/anonymous/tumor/
<disease_study>/gsc/<domain>/
solid_dnaseq_curated/mutations
mixed_dnaseq_curatedCurated Mutation CallingMixed curated DNA sequencingLevel_2Exampletcgafiles/ftp_auth/
distro_ftpusers/anonymous/tumor/
<disease_study>/gsc/<domain>/
mixed_dnaseq_curated/mutations
mixed_dnaseq_Cont_curatedCurated Mutation CallingMixed curated DNA sequencing - controlledLevel_2Example tcgafiles/ftp_auth/  
  distro_ftpusers/anonymous/tumor/  
  <disease_study>/gsc/<domain>/  
  mixed_dnaseq_Cont_curated/mutations

MAF/VCF spec

GSC archives may include MAF/VCF files. Only archives which contain somatic MAF files should be in public access. Archives which contains VCF files and protected MAF files should be in controlled access.

Current MAF specification: See MAF Specification

Current VCF specification: See VCF Specification

Validation

GSC archives undergo standard validation for Genome Sequencing Centers shown in the Archive Validation page.

 

 

 

 

 

  • No labels