Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

Wiki Markup

The TCGA barcode was the primary identifier of biospecimen data since the pilot project began. However, since for any one sample, the barcode can change as the meta-data associated with it changes, the TCGA project transitioned to using UUIDs as the primary identifier.


padding40px 20px 20px 0px
Table of Contents



Wiki Markup
Historically, the [BCR|] received [participant|] [samples|] and their associated metadata from [TSSs|]. The BCR then assigned human-readable IDs, referred to as TCGA barcodes, representing the metadata of the participants and their samples. TCGA barcodes were used to tie together data that spans the TCGA network, since the IDs uniquely identify a set of results for a particular sample produced by a particular data-generating center (i.e. [GCC|], [GSC|] or [GDAC|]). The constitutive parts of this barcode provided metadata values for a sample.

Currently the BCR is assigning both a TCGA barcode and a UUID to samples. The UUID is the primary identifier. 
{text-extractor}{multi-excerpt-include:TCGA:TCGA Barcode|name=definition|nopanel=true}{text-extractor}

For more information on the ID transition, see UUIDs.

Creating Barcodes

All TCGA barcodes are created by the BCR. The following figure illustrates how a sample is processed and assigned a TCGA barcode at each step. Starting from the Tissue Source Site (TSS) and the participant (who donated a tissue sample to the TSS), the barcodes TCGA-02 and TCGA-02-0001 are assigned respectively. The sample itself is also assigned a barcode: TCGA-02-0001-01. The sample is split into vials (e.g. TCGA-02-0001-01B) which are divided into portions (e.g. TCGA-02-0001-01B-02). Analytes (e.g. TCGA-02-0001-01B-02D) are extracted from each portion and distributed across one or more plates (e.g. TCGA-02-0001-01B-02D-0182), where each well is identified as an aliquot (e.g. TCGA-02-0001-01B-02D-0182-06). These plates are sent to GCCs or GSCs for characterization and sequencing.

border1px solid gray
margin0px 20px 0px 0px

figure illustrating flow for creating barcodes. See text description
TCGA barcodes are created by the BCR. An identifier component is added to the barcode at each stage of tissue sample-processing, starting from the TSS identifier and ending at the aliquot identifier.

Wiki Markup

h1. Reading Barcodes

A TCGA barcode is composed of a collection of identifiers.  Each specifically identifies a TCGA [data element|]. Refer to the following figure for an illustration of how [metadata|] identifiers comprise a barcode. An [aliquot|] barcode, an example of which shows in the illustration, contains the highest number of identifiers.
{float:left|margin=0px 20px 0px 0px|padding=10px|background=#f0f0f0|border=1px solid gray|width=454px}
!barcode.png|alt="figure showing an example TCGA barcode; see text", align=center!
This figure of an aliquot barcode shows how it can be broken down into its components and translated into its metadata. The barcode metadata are further described in the following table.

|| Label || Identifier for || Value || Value description || Possible values ||
| Project | Project name | TCGA | TCGA project | TCGA |
| [TSS|] | Tissue source site | 02 | GBM (brain tumor) sample from MD Anderson | See [Code Tables Report |] |
| [Participant|] | Study participant | 0001 | The first participant from MD Anderson for GBM study | Any alpha-numeric value |
| [Sample|] | Sample type | 01 | A solid tumor | Tumor types range from 01 - 09, normal types from 10 - 19 and control samples from 20 - 29. See [Code Tables Report |] for a complete list of sample codes |
| [Vial|] | Order of sample in a sequence of samples | C | The third vial | A to Z |
| [Portion|] | Order of portion in a sequence of 100 - 120 mg sample portions | 01 | The first portion of the sample | 01-99 |
| [Analyte|] | Molecular type of analyte for analysis | D | The analyte is a DNA sample | See [Code Tables Report |] |
| [Plate|] | Order of plate in a sequence of 96-well plates | 0182 | The 182nd plate | 4-digit alphanumeric value |
| [Center|] | [Sequencing|] or [characterization|] center that will receive the aliquot for analysis | 01 | The Broad Institute [GCC|] | See [Code Tables Report |] |

h1. Barcode Types

{anchor:hierarchy}Barcodes can also be visualized hierarchically, with TSS barcodes at the top of the tree and aliquot barcodes at the bottom.  A parent barcode prefixes any of its descendent barcodes, reflecting the derivation of one biospecimen type from another.  For example, samples are collected from a participant and so the corresponding sample barcodes contain the participant barcode from which they were derived.
{float:left|margin=0px 20px 0px 0px|padding=10px|background=#f0f0f0|border=1px solid gray|width=533px}
!hierarchy.png|align=center, alt="An illustration of the hierarchical representation of barcodes; see text."!
*Hierarchy of biospecimen elements.* Barcodes are used to represent all biospecimen elements in this diagram.

Using the aliquot barcode example from the figure in [#Reading Barcodes], the following table displays a possible set of related barcodes at each level of the hierarchy:

|| Level || Barcode || Comment ||
| TSS | TCGA-02 | |
| Participant | TCGA-02-0001 | |
| Drug | TCGA-02-0001-C1 | Drug ID is 'C','D','H','I' or 'T' followed by a number |
| Examination | TCGA-02-0001-E3124 | Examination ID is 'E' followed by a number |
| Surgery | TCGA-02-0001-S145 | Surgery ID is 'S' followed by a number |
| Radiation | TCGA-02-0001-R2 | Radiation ID is 'R' followed by a number |
| Sample | TCGA-02-0001-01 | |
| Portion | TCGA-02-0001-01C-01 | |
| Shipped Portion | TCGA-CM-5341-01A-21-1933-20 | Used in the platform of MDA_RPPA_CORE only | 
| Slide | TCGA-02-0001-01C-01-TS1 | [Tissue slide] ID can be 'TS' ('Top Slide'), 'BS' ('Bottom Slide') or 'MS' ('Middle slide'), followed by a number or letter to indicate slide order |
| Analyte | TCGA-02-0001-01C-01D | Analytes of W and X both refer to analytes derived from whole genome amplification|
| Aliquot | TCGA-02-0001-01C-01D-0182-01 | |