Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Page tree
Skip to end of metadata
Go to start of metadata
Document Information

Specification for RNASeq Data Format
Version 1.0
March 24, 2011

Contents

RNASeq/miRNASeq Data Archives Specification

Synopsis

RNASeq data contains information about both nucleotide sequence and gene expression. The NCBI dbGaP database is the official repository for the actual sequence data, produced in the form of BAM files. RNASeq data archives submitted to the DCC include the gene expression information that is inferred from the sequence data, sequence coverage data in the form of wiggle files, and eventually mutation data in the form of MAF files. RNASeq submissions also include MAGE-TAB archives, with some non-standard modification to MAGE-TAB format. These archives contain files that describe the RNASeq experiment and the relationship between the experimental samples and the data files produced by assaying those samples.

This specification covers the validation and disposition of RNASeq submissions to the DCC. The DCC currently accepts only level 3 (expression and coverage) and MAGE-TAB archives. When centers begin submitting variant calls, a section on RNASeq MAF (mutation annotation format) files will added.

Design

Centers

We expect to receive RNASeq data from

  • UNC (unc.edu) and
  • Canada's Michael Smith Genome Sciences Centre (bcgsc.ca)

Data Types and Data Levels

image as described

This image models the data types and data levels we expect to get for RNASeq data. Red rectangles indicate Data Types. Blue rectangles indicate Data Levels. Data Types for RNASeq are on the left and miRNASeq are on the right.

Data Levels

Level 0 (not a real level in the DCC's DB right now) are reserved for Annotation Files. Annotation files are in the Generic Annotation File (GAF) format. The GAF files used for a particular experiment is listed in that experiment's SDRF file under the column "Annotaiton REF". The GAF files and their specification is listed in that column are stored at http://tcga-data.nci.nih.gov/docs/GAF/.

We do not intend on receiving actual Level 1 (BAM) files. These files are submitted to dbGaP so we will have telemetry on them.

We are not currently receiving Level 2 data although we anticipate it will be submitted during the last half of 2011. Variant files will follow the VCF format standard. Coverage files use the Wiggle (WIG) format specification. Submitted wig files will not be validated, however we expect to accept BigWig (.bw) files late in 2011 and those will be validated. When one is available we will provide that specification here. The bracketed elements to the sides of RNA- and miRNA- Seq Variant data types indicate the types of variants we expect to receive.

We are currently receiving Level 3 data. Sometime in the future we may receive an additional Level 3 data type for RNASeq (Expression-Splice Variants), but that analysis has not been developed yet.

Data Level monikers do not include the words "file" or "files" as depicted in the image.

Data Types

Annotations

Annotations are a data type reserved for metadata about biospecimens (e.g. annotations available in the Annotations Manager), or genomic mappings (i.e. coordinates of a probe/sequence read map to a transcript that maps to a genomic region; e.g. Platform Design Files (ADFs)). For RNASeq/miRNASeq we are concerned with Generic Annotation Files (GAFs).

Alignments

RNASeq/miRNASeq read sequences are aligned to a target (i.e. transcript or miRNA) database. The alignments take the form of a BAM files (Level 1).

Variants

Variants are differences between a RNASeq/miRNASeq read sequence and the reference target sequence. Variants will be recorded in Variant Calling Format (VCF) files and submitted to the DCC. Variants will be validated by comparing the DNASeq variants. Variant can be classified as Single Nucleotide Variants (SNVs), INDELS (insertions or deletions), or Fusion Genes.

Sequence Coverage

The density of sequence reads or other kinds of data calculated from alignment of reads covering a reference sequence. Sequence coverage uses the Wiggle (WIG) format specification.

Expression-Gene

The calculated expression signal of a gene. The calculations are described in the DESCRIPTION.txt file in the mage-tab archive of an experiment. RNA/miRNA Sequence-based data have a data type alias of Quantification-Gene.

Expression-Exon

The calculated expression signal of a particular composite exon of a gene. The calculations are described in the DESCRIPTION.txt file in the mage-tab archive of an experiment. RNA/miRNA Sequence-based data have a data type alias of Quantification-Exon.

Expression-Junction

The calculated expression signal of a particular composite splice junction of a gene. The calculations are described in the DESCRIPTION.txt file in the mage-tab archive of an experiment. RNA/miRNA Sequence-based data have a data type alias of Quantification-Junction.

Expression-miRNA

The calculated expression for all reads aligning to a particular miRNA. The calculations are described in the DESCRIPTION.txt file in the mage-tab archive of an experiment. RNA/miRNA Sequence-based data have a data type alias of Quantification-miRNA.

Expression-miRNA Isoform

The calculated expression for each individual miRNA sequence isoform observed. The calculations are described in the DESCRIPTION.txt file in the mage-tab archive of an experiment. RNA/miRNA Sequence-based data have a data type alias of Quantification-miRNA Isoform.

Archive structure

There are two archives for RNA-seq submissions: the data archive and the MAGE-TAB (metadata) archive. Naming conventions are similar to typical archives.

<domain>_<disease_abbrev>.<platform>.Level_3.<index>.<revision>.<series>
<domain>_<disease_abbrev>.<platform>.mage-tab.1.<revision>.<series>

unc.edu_COAD.IlluminaGA_RNASeq.Level_3.1.0.0
unc.edu_COAD.IlluminaGA_RNASeq.mage-tab.1.0.0
  • Archives are gzipped tars
  • Archives are flat, below a directory named after the archive
  • Archives contain a MANIFEST.txt file; each file in the archive (will the optional exception of the MANIFEST.txt file) will be represented in the MANIFEST.txt as the output of md5sum run against the file.

Archive files

Data archives

In addition to the MANIFEST.txt file, a complete RNASeq data archive contains

  • exon, gene, and splice junction data files for each aliquot(required)
  • a single DESCRIPTION.txt file (optional)
  • wiggle format coverage files for each aliquot (which can be added by the submitting after the initial upload of expression data)

Expression and coverage data in RNASeq archives is considered Level 3 data.

RNASeq submissions also include separate MAGE-TAB archives, which contain files that describe the RNASeq experiment and the relationship between the experimental samples and the data files produced by assaying those samples.

Data files

Data file names

Data file names within the Level 3 archives contain the TCGA barcode for the aliquot associated with the data. Names may also include center-specific id tokens, separated by periods. Center tokens are not required, they may consist of the following characters : [a-zA-Z0-9_-]. Extensions indicate the RNASeq data type:

Data type

File name

Validation

Exon

<center_token>.<TCGA barcode/UUID>.<center_token>.<index_integer>.trimmed.annotated.exon.quantification.txt

Filename must end with "exon.quantification.txt"

Gene

<center_token>.<TCGA barcode/UUID>.<center_token>.<index_integer>.trimmed.annotated.gene.quantification.txt

Filename must end with "gene.quantification.txt"

Splice junction

<center_token>.<TCGA barcode/UUID>.<center_token>.<index_integer>.trimmed.annotated.spljxn.quantification.txt

Filename must end with "spljxn.quantification.txt"

Wig (coverage)

<center_token>.<TCGA barcode/UUID>.<center_token>.<index_integer>.wig

Filename must end with ".wig"

Data file formats

All files are tab-delimited text, header line plus one record per line.

Exon file records have the following fields in this order: exon, raw_counts, median_length_normalized, RPKM.
Gene file records have the following fields in this order: gene, raw_counts, median_length_normalized, RPKM.
Splice junction file records have the following fields in this order: junction, raw_counts
Wiggle files are described at Wiggle Format Specification.

Note: RPKM stands for Reads Per Kilobase exon Model per million mapped reads (Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Nature Methods, 2008 Jul;5(7):621-8. Epub 2008 May 30)

Data field formats:

The following table describes the content of data fields:

Field

Description

gene

valid HUGO name|GENEID (if no HUGO name, '?' is present)

exon

standard chromosome token: chr1-chr22, chrX, chrY, chrM

junction

a coordinate pair, strand indicated with +/-, e.g. chr1:12227:,chr1:12595:

raw_counts

positive floating point or zero

median_length_normalized

positive float or zero

RPKM (Reads Per Kilobaseq exon Model per million mapped reads)

positive float or zero

For wig files, see Wiggle Format Specification.

Examples

Exon file:

UNCID_62878.TCGA-AA-3812-01A-01R-0905-07.100810_UNC6-RDR300211_00022_FC_629L9AAXX.6.trimmed.annotated.exon.quantification.txt

exon    raw_counts      median_length_normalized        RPKM
chr1:11874-12227:+      76      0.214689265536723       0.17296397791866
chr1:12595-12721:+      0       0       0
chr1:12613-12721:+      0       0       0
chr1:12646-12697:+      0       0       0
chr1:13221-14409:+      39      0.0328006728343146      0.0264257965466892
chr1:13403-14409:+      39      0.0387288977159881      0.0312018590804503
chr1:16765-14363:-      36074   15.0120682480233        12.0944427960716
chr1:17055-16854:-      7567    37.460396039604 30.179893238817
chr1:18061-17233:-      12453   15.021712907117 12.1022129964044
chr1:18379-18268:-      2222    19.8392857142857        15.9834809049675
chr1:18554-18497:-      139     2.39655172413793        1.93077710922541
chr1:19759-18913:-      7569    8.93624557260921        7.19946839462319
chr1:24901-24738:-      4159    25.359756097561 20.4310368416201
chr1:29370-29321:-      551     11.02   8.87824098656483
chr1:29961-29824:-      0       0       0
chr1:35174-34612:-      0       0       0
chr1:35481-35277:-      0       0       0
chr1:36081-35721:-      0       0       0
chr1:69091-70008:+      0       0       0

Gene file:

UNCID_62852.TCGA-AA-3812-01A-01R-0905-07.100810_UNC6-RDR300211_00022_FC_629L9AAXX.6.trimmed.annotated.gene.quantification.txt

gene    raw_counts      median_length_normalized        RPKM
?|729884	1	0.0355805243445693	0.015144773474878
?|8225	1390	72.2901447277739	30.9892751466931
?|90288	1	0.0350391885661595	0.0149143550679296
A1BG|1	47	1.3790259230165	0.597177572148448
A1CF|29974	189	6.65030106530801	2.83186921301628
A2BP1|54715	41	0.76283009466866	0.330423189296441
A2LD1|87769	181	10.7692913385827	4.61040294627041
A2ML1|144568	11	0.23030303030303	0.0980279883101193
A2M|2	6416	107.962831070554	46.1946804115845
A4GALT|53947	255	9.07971698113207	3.89106378127195
A4GNT|51146	1	0.0429136081309994	0.0182660847782831

Splice junction file:

UNCID_62877.TCGA-AA-3812-01A-01R-0905-07.100810_UNC6-RDR300211_00022_FC_629L9AAXX.6.trimmed.annotated.spljxn.quantification.txt

junction        raw_counts
chr1:12227:+,chr1:12595:+       0
chr1:12227:+,chr1:12613:+       0
chr1:12227:+,chr1:12646:+       0
chr1:12697:+,chr1:13221:+       0
chr1:12721:+,chr1:13221:+       0
chr1:12721:+,chr1:13403:+       0
chr1:14970:-,chr1:14829:-       26
chr1:15796:-,chr1:14829:-       2
chr1:15796:-,chr1:15038:-       14
chr1:16607:-,chr1:15942:-       0
chr1:16607:-,chr1:15947:-       6
chr1:16854:-,chr1:16765:-       0
chr1:16858:-,chr1:16765:-       12
chr1:17233:-,chr1:17055:-       45
chr1:17259:-,chr1:17055:-       0
chr1:17606:-,chr1:17055:-       2
chr1:17606:-,chr1:17368:-       6
chr1:17915:-,chr1:17368:-       0
chr1:17915:-,chr1:17742:-       15

Metadata (MAGE-TAB) archives

In addition to the MANIFEST.txt file, RNASeq MAGE-TAG archives contain

  • a DESCRIPTION.txt file
  • an IDF
  • an SDRF

Metadata (MAGE-TAB) files

Metadata (MAGE-TAB) file names

<domain>_<disease_abbrev>.<platform>_RNASeq.<index>.<revision>.<series>.idf.txt
<domain>_<disease_abbrev>.<platform>_RNASeq.<index>.<revision>.<series>.sdrf.txt

Metadata (MAGE-TAB) file formats

All files are tab-delimited text.

Investigation Description File (IDF)

The IDF can be considered as a set of tables that can have variable numbers of rows and columns. The first field in a IDF line is a row header, while subsequent fields are data items corresponding to the row header. Adjacent row lines represent a table. Data items falling in the same column with in tables are related (i.e., columns can be considered records).

Valid row headers in order are as follows (field values are explained in the next section):

Row Headers

Investigation Title

Experimental Design

Experimental Design Term Source REF

Experimental Factor Type

Experimental Factor Type Term Source REF

Person Last Name

Person First Name

Person Mid Initials

Person Email

Person Phone

Person Address

Person Affiliation

Person Roles

Date of Experiment

Public Release Date

Experiment Description

Protocol Name

Protocol Type

Protocol Term Source REF

Protocol Description

Protocol Parameters

SDRF Files

Term Source Name

Term Source File

Term Source Version

Sample-Data Relationship File (SDRF)

SDRFs have a single header line, followed by records, one record per line.

Valid headers in order are as follows (field values are explained in the next section):

Headers

Extract Name

Comment [TCGA Barcode]

Material Type

Protocol REF

Assay Name

Protocol REF

Assay Name

Protocol REF

Assay Name

Comment [NCBI SRA Experiment Accession]

Annotation REF

Protocol REF

Data Transformation Name

Derived Data File REF

Comment [Genome reference]

Comment [NCBI dbGAP Experiment Accession]

Comment [TCGA Include for Analysis]

Comment [TCGA Data Type]

Comment [TCGA Data Level]

Protocol REF

Data Transformation Name

Derived Data File

Comment [TCGA Include for Analysis]

Comment [TCGA Data Type]

Comment [TCGA Data Level]

Comment [TCGA Archive Name]

Admissible values for the corresponding fields (in the record lines) are given in the next section.

Metadata (MAGE-TAB) field formats

Investigation Description File (IDF)

IDF field description:

Field (row header)

Description

Investigation Title

free text title of study

Experimental Design

standard design terms

Experimental Design Term Source REF

ontological source of terms on previous line

Experimental Factor Type

standard experimental factor terms

Experimental Factor Type Term Source REF

ontological source of terms on previous line

Contact Info 

Person Last Name

list of surnames

Person First Name

list of first names

Person Mid Initials

list of MIs

Person Email

list of emails

Person Phone

list of phone numbers

Person Address

list of mailing addresses

Person Affiliation

list of affiliations

Person Roles

list of roles

Experiment Description 

Date of Experiment

date in YYYYMMDD format

Public Release Date

date in YYYYMMDD format

Experiment Description

free text description (no tabs or linebreaks)

Protocol Name

tokens serving as references in the SDRF

Protocol Type

list of standard protocol types

Protocol Term Source REF

list of ontological sources of protocol types

Protocol Description

list of free text descriptions (no tabs or linebreaks)

Protocol Parameters

list of <tag>=<value> pairs describing experimental parameters

SDRF Files

The SDRF filename, present in the submitted mage-tab archive

Term Source Name

list of ontological sources in above REF fields

Term Source File

list of urls describing the location of ontological source files

Term Source Version

list of version numbers for ontological sources

IDF example:

Investigation Title	TCGA Analysis of RNASeq data from Illumina GAII sequencers for colon cancer samples.
Experimental Design	transcript_identification_design	is_expressed_design
Experimental Design Term Source REF	MGED Ontology	MGED Ontology
Experimental Factor Type	disease_state_design
Experimental Factor Type Term Source REF	MGED Ontology

Person Last Name	Neil	Charles	Derek	Brian
Person First Name	Hayes	Perou	Chiang	O'Connor
Person Mid Initials	D.	M.	D.	D.
Person Email	hayes@med.unc.edu	cperou@med.unc.edu	Derek_Chiang@med.unc.edu	brianoc@email.unc.edu
Person Phone	919-966-3786	919-843-5740	919-843-7887	919-966-3786
Person Address	Lineberger Comprehensive Cancer Center, UNC, Chapel Hill, NC 27599-7264	Lineberger Comprehensive Cancer Center, UNC, Chapel Hill, NC 27599-7264	Lineberger Comprehensive Cancer Center, UNC, Chapel Hill, NC 27599-7264	Lineberger Comprehensive Cancer Center, UNC, Chapel Hill, NC 27599-7264
Person Affiliation	University of North Carolina, Chapel Hills	University of North Carolina, Chapel Hills	University of North Carolina, Chapel Hills	University of North Carolina, Chapel Hills
Person Roles	investigator	investigator	investigator	submitter

Date of Experiment	20101022

Public Release Date	20101022

Experiment Description	TCGA Analysis of RNASeq data from Illumina GAII sequencers for colon cancer samples.

Protocol Name	unc.edu:reverse_transcription:IlluminaGA_RNASeq:01	unc.edu:library_preparation:IlluminaGA_RNASeq:01	unc.edu:DNA_Sequencing:IlluminaGA_RNASeq:01	unc.edu:consensus_mRNA:IlluminaGA_RNASeq:01	unc.edu:gene_expression:IlluminaGA_RNASeq:01	unc.edu:exon_expression:IlluminaGA_RNASeq:01	unc.edu:splice_junction_expression:IlluminaGA_RNASeq:01
Protocol Type	reverse_transcription	library_preparation	DNA sequencing	consensus_mRNA	gene_expression	gene_expression	gene_expression
Protocol Term Source REF	MGED Ontology	NCI EVS	NCI EVS	MGED Ontology	MGED Ontology	MGED Ontology	MGED Ontology
Protocol Description	Ligation of linkers and reverse transcription of mRNAs	PCR with sequencing primers, size fractionation	Sequencing on Illumina GAII	Alignment of reads to reference transcriptome (UCSC genes Dec 2009 build) with subsequent mapping to the whole genome (UCSC hg19 based on NCBI36.1) calculated using the SeqWare framework via the RNASeqAlignmentBWA workflow (http://seqware.sourceforge.net)	Read counts and RPKM per composite gene (UCSC genes Dec 2009 build) calculated using the SeqWare framework via the RNASeqAlignmentBWA workflow (http://seqware.sourceforge.net)	Read counts and RPKM per composite exon (UCSC genes Dec 2009 build) calculated using the SeqWare framework via the RNASeqAlignmentBWA workflow (http://seqware.sourceforge.net)	Read counts per splice junction (UCSC genes Dec 2009 build) calculated using the SeqWare framework via the RNASeqAlignmentBWA workflow (http://seqware.sourceforge.net)
Protocol Parameters				SeqWareVersion=0.7.0;RNASeqAlignmentBWAWorfklowVersion=0.7.4	SeqWareVersion=0.7.0;RNASeqAlignmentBWAWorfklowVersion=0.7.4	SeqWareVersion=0.7.0;RNASeqAlignmentBWAWorfklowVersion=0.7.4	SeqWareVersion=0.7.0;RNASeqAlignmentBWAWorfklowVersion=0.7.4

SDRF Files	unc.edu_colon.IlluminaGA_rnaseq.sdrf.txt
Term Source Name	MGED Ontology	NCI EVS	NCBI Taxonomy
Term Source File	http://mged.sourceforge.net/ontologies/MGEDontology.php	http://evs.nci.nih.gov/	http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/
Term Source Version	1.3.1.1	2010-09	2010-09
Sample-Data Relationship File (SDRF)

Each line in an SDRF describes an aliquot, the processes performed on the aliquot to obtain data, and the files created which contain data related to the aliquot. Columns named Protocol REF contain values referring to content in the IDF. Columns names Derived Data File contain file name values; the files referenced are present in the accompanying data archive.

SDRF field description:

Field

Description

Extract Name

TCGA UUID

Comment [TCGA Barcode]

TCGA Barcode

Material Type

always "Total RNA"

 

Protocol REF

token referring to IDF description

Assay Name

unique identifier that included aliquot barcode or uuid

Comment [NCBI SRA Experiment Accession]

null ( -> ) or SRX accession number

Annotation REF

url pointing to an annotation table file (e.g., GAF)

Data Transformation Name

token referring to an IDF description of a data transformation

Derived Data File REF

file in the data archive that is the product of the adjacent data transformation

Comment [Genome reference]

common name of the reference genome employed

Comment [NCBI dbGAP Experiment Accession]

null ( -> ) or dbGaP accession, e.g. phs000178.v3.p3

Comment [TCGA Include for Analysis]

yes|no QC indicator

 

Comment [TCGA Data Type]

DCC data type identifier

Comment [TCGA Data Level]

Level 1 or Level 3 as appropriate

Comment [TCGA Archive Name]

valid DCC archive name (without .tar.gz extension)

SDRF example:

Extract Name    Comment [TCGA Barcode]  Material Type   Protocol REF    Assay Name  Protocol REF    Assay Name  Protocol REF    Assay Name  Comment [NCBI SRA Experiment Accession] Annotation REF  Protocol REF    Data Transformation Name    Derived Data File REF   Comment [Genome reference]  Comment [NCBI dbGAP Experiment Accession]   Comment [TCGA Include for Analysis] Comment [TCGA Data Type]    Comment [TCGA Data Level]   Protocol REF    Data Transformation Name    Derived Data File   Comment [TCGA Include for Analysis] Comment [TCGA Data Type]    Comment [TCGA Data Level]   Comment [TCGA Archive Name]
2d0702a8-bf85-458e-879c-ebc03a2aa3d1    TCGA-09-0364-01A-02R-1564-13    Total RNA   bcgsc.ca:reverse_transcription:IlluminaHiSeq_RNASeq:01  2d0702a8-bf85-458e-879c-ebc03a2aa3d1_reverse_transcription  bcgsc.ca:library_preparation:IlluminaHiSeq_RNASeq:01    2d0702a8-bf85-458e-879c-ebc03a2aa3d1_library_preparation    bcgsc.ca:RNA_Sequencing:IlluminaHiSeq_RNASeq:01 2d0702a8-bf85-458e-879c-ebc03a2aa3d1_RNA_Sequencing ->   http://tcga-data.nci.nih.gov/docs/GAF/GAF_bundle_Feb2011/outputs/TCGA.hg18.Feb2011.gaf.gz   bcgsc.ca:consensus_mRNA:IlluminaHiSeq_RNASeq:01 TCGA-09-0364-01A-02R-1564-13_alignment  TCGA-09-0364-01A-02R-1564-13_rnaseq.bam NCBI36.1    ->   yes RNA Sequence-Alignment  Level 1 bcgsc.ca:indel:IlluminaHiSeq_RNASeq:01  TCGA-09-0364-01A-02R-1564-13_indel  bcgsc.ca.TCGA-09-0364-01A-02R-1564-13.bcgsc.ca.1.4.0.indel.vcf  yes RNA Sequence-Indel  Level 2 bcgsc.ca_OV.IlluminaHiSeq_RNASeq.Level_2.1.4.0
c5a82b6e-048d-4377-9b6c-4417ba1876fe    TCGA-25-1630-01A-01R-1566-13    Total RNA   bcgsc.ca:reverse_transcription:IlluminaHiSeq_RNASeq:01  c5a82b6e-048d-4377-9b6c-4417ba1876fe_reverse_transcription  bcgsc.ca:library_preparation:IlluminaHiSeq_RNASeq:01    c5a82b6e-048d-4377-9b6c-4417ba1876fe_library_preparation    bcgsc.ca:RNA_Sequencing:IlluminaHiSeq_RNASeq:01 c5a82b6e-048d-4377-9b6c-4417ba1876fe_RNA_Sequencing ->   http://tcga-data.nci.nih.gov/docs/GAF/GAF_bundle_Feb2011/outputs/TCGA.hg18.Feb2011.gaf.gz   bcgsc.ca:consensus_mRNA:IlluminaHiSeq_RNASeq:01 TCGA-25-1630-01A-01R-1566-13_alignment  TCGA-25-1630-01A-01R-1566-13_rnaseq.bam NCBI36.1    ->   yes RNA Sequence-Alignment  Level 1 bcgsc.ca:indel:IlluminaHiSeq_RNASeq:01  TCGA-25-1630-01A-01R-1566-13_indel  bcgsc.ca.TCGA-25-1630-01A-01R-1566-13.bcgsc.ca.1.4.0.indel.vcf  yes RNA Sequence-Indel  Level 2 bcgsc.ca_OV.IlluminaHiSeq_RNASeq.Level_2.1.4.0
2d0702a8-bf85-458e-879c-ebc03a2aa3d1    TCGA-09-0364-01A-02R-1564-13    Total RNA   bcgsc.ca:reverse_transcription:IlluminaHiSeq_RNASeq:01  2d0702a8-bf85-458e-879c-ebc03a2aa3d1_reverse_transcription  bcgsc.ca:library_preparation:IlluminaHiSeq_RNASeq:01    2d0702a8-bf85-458e-879c-ebc03a2aa3d1_library_preparation    bcgsc.ca:RNA_Sequencing:IlluminaHiSeq_RNASeq:01 2d0702a8-bf85-458e-879c-ebc03a2aa3d1_RNA_Sequencing ->   http://tcga-data.nci.nih.gov/docs/GAF/GAF_bundle_Feb2011/outputs/TCGA.hg18.Feb2011.gaf.gz   bcgsc.ca:consensus_mRNA:IlluminaHiSeq_RNASeq:01 TCGA-09-0364-01A-02R-1564-13_alignment  TCGA-09-0364-01A-02R-1564-13_rnaseq.bam NCBI36.1    ->   yes RNA Sequence-Alignment  Level 1 bcgsc.ca:snv:IlluminaHiSeq_RNASeq:01    TCGA-09-0364-01A-02R-1564-13_snv    bcgsc.ca.TCGA-09-0364-01A-02R-1564-13.bcgsc.ca.1.4.0.snv.vcf    yes RNA Sequence-Single nucleotide variant  Level 2 bcgsc.ca_OV.IlluminaHiSeq_RNASeq.Level_2.1.4.0
c5a82b6e-048d-4377-9b6c-4417ba1876fe    TCGA-25-1630-01A-01R-1566-13    Total RNA   bcgsc.ca:reverse_transcription:IlluminaHiSeq_RNASeq:01  c5a82b6e-048d-4377-9b6c-4417ba1876fe_reverse_transcription  bcgsc.ca:library_preparation:IlluminaHiSeq_RNASeq:01    c5a82b6e-048d-4377-9b6c-4417ba1876fe_library_preparation    bcgsc.ca:RNA_Sequencing:IlluminaHiSeq_RNASeq:01 c5a82b6e-048d-4377-9b6c-4417ba1876fe_RNA_Sequencing ->   http://tcga-data.nci.nih.gov/docs/GAF/GAF_bundle_Feb2011/outputs/TCGA.hg18.Feb2011.gaf.gz   bcgsc.ca:consensus_mRNA:IlluminaHiSeq_RNASeq:01 TCGA-25-1630-01A-01R-1566-13_alignment  TCGA-25-1630-01A-01R-1566-13_rnaseq.bam NCBI36.1    ->   yes RNA Sequence-Alignment  Level 1 bcgsc.ca:snv:IlluminaHiSeq_RNASeq:01    TCGA-25-1630-01A-01R-1566-13_snv    bcgsc.ca.TCGA-25-1630-01A-01R-1566-13.bcgsc.ca.1.4.0.snv.vcf    yes RNA Sequence-Single nucleotide variant  Level 2 bcgsc.ca_OV.IlluminaHiSeq_RNASeq.Level_2.1.4.0
efabaf9d-2e46-4d4e-9b1b-763ec4d8d537    TCGA-20-1686-01A-01R-1566-13    Total RNA   bcgsc.ca:reverse_transcription:IlluminaHiSeq_RNASeq:01  efabaf9d-2e46-4d4e-9b1b-763ec4d8d537_reverse_transcription  bcgsc.ca:library_preparation:IlluminaHiSeq_RNASeq:01    efabaf9d-2e46-4d4e-9b1b-763ec4d8d537_library_preparation    bcgsc.ca:RNA_Sequencing:IlluminaHiSeq_RNASeq:01 efabaf9d-2e46-4d4e-9b1b-763ec4d8d537_RNA_Sequencing ->   http://tcga-data.nci.nih.gov/docs/GAF/GAF_bundle_Feb2011/outputs/TCGA.hg18.Feb2011.gaf.gz   bcgsc.ca:consensus_mRNA:IlluminaHiSeq_RNASeq:01 TCGA-20-1686-01A-01R-1566-13_alignment  TCGA-20-1686-01A-01R-1566-13_rnaseq.bam NCBI36.1    ->   no  RNA Sequence-Alignment  Level 1 bcgsc.ca:indel:IlluminaHiSeq_RNASeq:01  TCGA-20-1686-01A-01R-1566-13_indel  bcgsc.ca.TCGA-20-1686-01A-01R-1566-13.bcgsc.ca.1.3.0.indel.vcf  no  RNA Sequence-Indel  Level 2 bcgsc.ca_OV.IlluminaHiSeq_RNASeq.Level_2.1.3.0
e2c174f6-eac3-449e-879c-9335ccf4ccd6    TCGA-30-1866-01A-02R-1568-13    Total RNA   bcgsc.ca:reverse_transcription:IlluminaHiSeq_RNASeq:01  e2c174f6-eac3-449e-879c-9335ccf4ccd6_reverse_transcription  bcgsc.ca:library_preparation:IlluminaHiSeq_RNASeq:01    e2c174f6-eac3-449e-879c-9335ccf4ccd6_library_preparation    bcgsc.ca:RNA_Sequencing:IlluminaHiSeq_RNASeq:01 e2c174f6-eac3-449e-879c-9335ccf4ccd6_RNA_Sequencing ->   http://tcga-data.nci.nih.gov/docs/GAF/GAF_bundle_Feb2011/outputs/TCGA.hg18.Feb2011.gaf.gz   bcgsc.ca:consensus_mRNA:IlluminaHiSeq_RNASeq:01 TCGA-30-1866-01A-02R-1568-13_alignment  TCGA-30-1866-01A-02R-1568-13_rnaseq.bam NCBI36.1    ->   no  RNA Sequence-Alignment  Level 1 bcgsc.ca:indel:IlluminaHiSeq_RNASeq:01  TCGA-30-1866-01A-02R-1568-13_indel  bcgsc.ca.TCGA-30-1866-01A-02R-1568-13.bcgsc.ca.1.3.0.indel.vcf  no  RNA Sequence-Indel  Level 2 bcgsc.ca_OV.IlluminaHiSeq_RNASeq.Level_2.1.3.0
67670628-ad07-41ef-8977-890080616900    TCGA-09-2056-01B-01R-1568-13    Total RNA   bcgsc.ca:reverse_transcription:IlluminaHiSeq_RNASeq:01  67670628-ad07-41ef-8977-890080616900_reverse_transcription  bcgsc.ca:library_preparation:IlluminaHiSeq_RNASeq:01    67670628-ad07-41ef-8977-890080616900_library_preparation    bcgsc.ca:RNA_Sequencing:IlluminaHiSeq_RNASeq:01 67670628-ad07-41ef-8977-890080616900_RNA_Sequencing ->   http://tcga-data.nci.nih.gov/docs/GAF/GAF_bundle_Feb2011/outputs/TCGA.hg18.Feb2011.gaf.gz   bcgsc.ca:consensus_mRNA:IlluminaHiSeq_RNASeq:01 TCGA-09-2056-01B-01R-1568-13_alignment  TCGA-09-2056-01B-01R-1568-13_rnaseq.bam NCBI36.1    ->   yes RNA Sequence-Alignment  Level 1 bcgsc.ca:indel:IlluminaHiSeq_RNASeq:01  TCGA-09-2056-01B-01R-1568-13_indel  bcgsc.ca.TCGA-09-2056-01B-01R-1568-13.bcgsc.ca.1.3.0.indel.vcf  yes RNA Sequence-Indel  Level 2 bcgsc.ca_OV.IlluminaHiSeq_RNASeq.Level_2.1.3.0
...