Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Skip to end of metadata
Go to start of metadata

Description

RNASeq Version 2 is similar to RNASeq in that it uses sequencing data to determine gene expression levels.  RNASeq Version 2 uses a different set of algorithms to determine the expression levels are the results are presented in a slightly different set of files.

Data Overview

There are two analysis pipelines used to create Level 3 expression data from RNA Sequence data. The first approach used at TCGA relies on the RPKM method, while the second method uses MapSplice to do the alignment and RSEM to perform the quantitation.

References:

Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN. (2010)
RNA-Seq gene expression estimation with read mapping uncertainty.
Bioinformatics. Feb 15;26(4):493-500.

Pubmed Link
Bioinformatics link Exit Disclaimer logo

Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Liu J. (2010)
MapSplice: accurate mapping of RNA-seq reads for splice junction discovery.
Nucleic Acids Res. Oct;38(18):e178.

Pubmed Link
Nucleic Acids Research link Exit Disclaimer logo

Data File Descriptions

Available Platforms

  • Platform Code - used in archive names.
  • Platform Alias - used to group similar platforms and used in some applications to save space when referring to platforms.
  • Platform Name - full name of platform

 

Platform Code

Platform Alias

Platform Name

IlluminaGA_RNASeqV2

IlluminaGA_RNASeqV2

Illumina Genome Analyzer RNA Sequencing Version 2 analysis

IlluminaHiSeq_RNASeqV2

IlluminaHiSeq_RNASeqV2

Illumina HiSeq 2000 RNA Sequencing Version 2 analysis

 

Available Data Files

Platform Code

Data Level

File Type

Description

IlluminaGA_RNASeqV2

Level 3

Tab-delimited ASCII text

  1. .exon_quantification.txt
  2. .junction_quantification.txt
  3. .rsem.genes.results
  4. .rsem.genes.normalized_results
  5. .rsem.isoforms.results
  6. .rsem.isoforms.normalized_results
  1. The calculated expression signal of a particular composite exon of a gene
  2. The calculated expression signal of a particular composite splice junction of a gene.
  3. The raw expression signal for expression of a gene
  4. The normalized results for expression of a gene
  5. The raw expression signal of individual isoforms (transcripts)
  6. The normalized expression signal of individual isoforms (transcripts)

IlluminaHiSeq_RNASeqV2

Level 3

Tab-delimited ASCII text

  1. .exon_quantification.txt
  2. .junction_quantification.txt
  3. .rsem.genes.results
  4. .rsem.genes.normalized_results
  5. .rsem.isoforms.results
  6. .rsem.isoforms.normalized_results

The calculated expression signal of a particular composite exon of a gene
The calculated expression signal of a particular composite splice junction of a gene.
The raw expression signal for expression of a gene
The normalized results for expression of a gene
The raw expression signal of individual isoforms (transcripts)
The normalized expression signal of individual isoforms (transcripts)

  1. The calculated expression signal of a particular composite exon of a gene
  2. The calculated expression signal of a particular composite splice junction of a gene.
  3. The raw expression signal for expression of a gene
  4. The normalized results for expression of a gene
  5. The raw expression signal of individual isoforms (transcripts)
  6. The normalized expression signal of individual isoforms (transcripts)

Validations

Level 3

Where possible, the validation for RNASeqV2 follows the rules from the original RNASeq spec

Platform name

This pipeline will use data from the existing IlluminaGA and IlluminaHiSeq platforms, but will need to be distinguished from the existing RNASeq data. Therefore the proposal is to use a "V2" designation on the platforms.

Existing Platform Names

V2 Platform Names

IlluminaGA_RNASeq

IlluminGA_RNASeqV2

IlluminaHiSeq_RNASeq

IllumninaHiSeq_RNASeqV2

Archive names

Archive names will remain unchanged except for the platform names and use the current <domain>_<disease study>.<platform>.<archive type>.<serial index>.<revision>.<series>.tar.gz convention.. Both a data archive and a mage-tab archive will be required.

Example existing archive names

Example V2 archive names

unc.edu_BRCA.IlluminaGA_RNASeq.Level_3.1.0.0.tar.gz

unc.edu_BRCA.IlluminaGA_RNASeqV2.Level_3.1.0.0.tar.gz

unc.edu_BRCA.IlluminaHiSeq_RNASeq.Level_3.1.0.0.tar.gz

unc.edu_BRCA.IlluminaHiSeq_RNASeqV2.Level_3.1.0.0.tar.gz

unc.edu_BRCA.IlluminaGA_RNASeq.mage-tab.1.1.0.tar.gz

unc.edu_BRCA.IlluminaGA_RNASeqV2.mage-tab.1.1.0.tar.gz

unc.edu_BRCA.IlluminaHiSeq_RNASeq.mage-tab.1.1.0.tar.gz

unc.edu_BRCA.IlluminaHiSeq_RNASeqV2.mage-tab.1.1.0.tar.gz

RNASeqV2 Files

Data Archive files

  • Each sample will have 6 files in the data archive. All six must be present for each barcode of the validation should fail.
  • Each file will be tab-delimited text or the archive should fail.
  • Barcodes/UUIDs must be checked for validity and any unknown barcodes should cause the archive to fail

File Data Type

Platform Data Type

File name

Column Headers

Example content

Data Level

FTP Display

Expression-Exon

RNASeqV2

<domain>.<TCGA aliquot barcode/UUID>.<center_token>.<index_integer>.exon_quantification.txt

exon,raw_counts,median_length_normalized,RPKM

chr1:11874-12227:+ 400 1.12994350282486 0.114434323723804

Level 3

rnaseqv2

Expression-Splice Junction

RNASeqV2

<domain>.<TCGA aliquot barcode/UUID>.<center_token>.<index_integer>.junction_quantification.txt

junction, raw_counts

chr1:12227:+,chr1:12595:+0

Level 3

rnaseqv2

Expression-Genes

RNASeqV2

<domain>.<TCGA aliquot barcode/UUID>.<center_token>.<index_integer>.rsem.genes.normalized_results

gene_id, normalized_count

A1CF|29974 0.0000

Level 3

rnaseqv2

Expression-Genes

RNASeqV2

<domain>.<TCGA aliquot barcode/UUID>.<center_token>.<index_integer>.rsem.genes.results

gene_id, raw_count, scaled_estimate, transcript_id

AADACL2|344752 8.00 1.84375506432328e-07 uc003ezc.2,uc010hvn.2

Level 3

rnaseqv2

Expression-Genes

RNASeqV2

<domain>.<TCGA aliquot barcode/UUID>.<center_token>.<index_integer>.rsem.isoforms.normalized_results

isoform_id, normalized_count

uc001jiu.2 712.7049

Level 3

rnaseqv2

Expression-Genes

RNASeqV2

<domain>.<TCGA aliquot barcode/UUID>.<center_token>.<index_integer>.rsem.isoforms.results

isoform_id, raw_count, scaled_estimate

uc002qsf.1 19.80 1.57927207139021e-07

Level 3

rnaseqv2

File field descriptions

File Extension

Field/Column name

Description

.exon_quantification.txt

exon

Chromosomal location of the exon

.exon_quantification.txt

raw_counts

The number of reads mapping to this exon

.exon_quantification.txt

median_length_normalized

The total aligned bases to this exon
divided by the exon length

.exon_quantification.txt

RPKM

Reads per kilobase of exon per million mapped reads

.junction_quantification.txt

junction

The genomic location of the splice junction

.junction_quantification.txt

raw_counts

The number of reads mapping to this junction

.rsem.genes.results, .rsem.genes.normalized_results

gene_id

The gene name and identifier

.rsem.genes.results

raw_count

The number of reads mapping to this gene

.rsem.genes.results

scaled_estimate

tau value

.rsem.genes.results

transcript_id

The IDs of all transcripts mapping to this gene

.rsem.genes.normalized_results

normalized_count

upper quartile normalized RSEM count estimates

.rsem.isoforms.results

isoform_id

ID of the gene isoform

.rsem.isoforms.results

raw_count

The number of reads mapping to this isoform

.rsem.isoforms.results

scaled_estimate

tau value

.rsem.isoforms.normalized_results

isoform_id

ID of the gene isoform

.rsem.isoforms.normalized_results

normalized_count

upper quartile normalized RSEM count estimates

 

Standard Archive Validations

All RNASeq and IlluminaGA_mRNA_DGE data are processed using a standard set of validations. Data from RNA Sequencing follow the GCC route.

The validation sets run on all RNASeq data are listed below:

Standard MAGE-TAB File Validations

This data group includes MAGE-TAB archives and documents. All MAGE-TAB archive validations are covered under Standard Archive Validations. All MAGE-TAB documents submitted to the DCC are processed using a standard set of validations.

  • No labels