Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Page tree
Skip to end of metadata
Go to start of metadata

A biotab file is a tab-delimited text file that contains TCGA clinical data. This file is auto-generated by the DCC through the Data Matrix application.

About Clinical and Biospecimen Data

Clinical and biospecimen data are represented in two file types, XML and a tab-delimited text file type called biotab which present the same data structure in different ways. Both are open access data. They enable the collection of a series of barcodes corresponding to participants that fit within the clinical data types of interest.

Each XML file contains data for a single participant; each biotab file contains data for multiple participants.

Either type of file can be used to extract and aggregate aliquot barcodes associated with participants' clinical data. Once relevant sample or aliquot barcodes and data have been parsed from the available XML or biotab file, samples can be aggregated according to clinical data elements of interest. The aggregated barcodes can then be mapped to the relevant data (see TCGA barcode).

For more information about XML file types, see XML.


Working with Biotab Files

Biotab files contain clinical or biospecimen data for multiple participants. They are likely to be a substantially more convenient alternative to clinical XML. Each line contains tab-separated data for a single participant or other object (for example, sample, aliquot). These files can be conveniently opened in a spreadsheet program such as Excel, where data can easily be sorted and extracted. Biotab files can also be sorted and manipulated with command line programs (such as the UNIX utilities cut, sort, uniq, grep and join), or exported to a local relational database (For example, MySQL or PostgreSQL).

Biotab files of clinical data can be obtained for a given disease study or individual participants using the Data Matrix. See the TCGA Data Matrix Users Guide for more information. See Pre-built Biotab Files for an alternative to the Data Matrix.

The following block of text is a example listing of biotab files as returned in a Data Matrix archive:

 -rw-r--r-- jboss45/0           854 file_manifest.txt
 -rw-r--r-- jboss45/0        293020 Clinical/BCR/clinical_aliquot_all_COAD.txt
 -rw-r--r-- jboss45/0        236772 Clinical/BCR/clinical_analyte_all_COAD.txt
 -rw-r--r-- jboss45/0          3110 Clinical/BCR/clinical_drug_all_COAD.txt
 -rw-r--r-- jboss45/0            44 Clinical/BCR/clinical_examination_all_COAD.txt
 -rw-r--r-- jboss45/0         56189 Clinical/BCR/clinical_patient_all_COAD.txt
 -rw-r--r-- jboss45/0         26538 Clinical/BCR/clinical_portion_all_COAD.txt
 -rw-r--r-- jboss45/0        100282 Clinical/BCR/clinical_protocol_all_COAD.txt
 -rw-r--r-- jboss45/0            42 Clinical/BCR/clinical_radiation_all_COAD.txt
 -rw-r--r-- jboss45/0         43386 Clinical/BCR/clinical_sample_all_COAD.txt
 -rw-r--r-- jboss45/0         47472 Clinical/BCR/clinical_slide_all_COAD.txt
 -rw-r--r-- jboss45/0            40 Clinical/BCR/clinical_surgery_all_COAD.txt

The file Clinical/BCR/clinical_patient_all_COAD.txt as it appears in Excel:

example biotab file as it appears in Excel

This example shows participant barcodes and the associated values of clinical parameters. Sorting and filtering on columns containing clinical values of interest allows the user to select a list of patient barcodes meeting desired conditions. The list of barcodes can then be used to identify relevant aliquot barcodes and mapped to assay data.

Pre-built Biotab Files

Biotab files are open access, tab-delimited files submitted to the DCC as Level 2 archives. To browse biotab files for a given tumor, point a browser to a URL of this form:<disabbrev>/bcr/

where <disabbrev>  (must be lower case with the chevrons, as shown)  should be replaced with the disease abbreviation of the desired disease obtained from the Code Tables Report.

  • No labels