About Clinical and Biospecimen Data
Clinical and biospecimen data are represented in two file types, XML and a tab-delimited text file type called biotab which present the same data structure in different ways. Both are open access data. They enable the collection of a series of barcodes corresponding to participants that fit within the clinical data types of interest.
Each XML file contains data for a single participant; each biotab file contains data for multiple participants.
Either type of file can be used to extract and aggregate aliquot barcodes associated with participants' clinical data. Once relevant sample or aliquot barcodes and data have been parsed from the available XML or biotab file, samples can be aggregated according to clinical data elements of interest. The aggregated barcodes can then be mapped to the relevant data (see TCGA barcode).
For more information about XML file types, see XML.
Working with Biotab Files
Biotab files contain clinical or biospecimen data for multiple participants. They are likely to be a substantially more convenient alternative to clinical XML. Each line contains tab-separated data for a single participant or other object (for example, sample, aliquot). These files can be conveniently opened in a spreadsheet program such as Excel, where data can easily be sorted and extracted. Biotab files can also be sorted and manipulated with command line programs (such as the UNIX utilities
join), or exported to a local relational database (For example, MySQL or PostgreSQL).
Biotab files of clinical data can be obtained for a given disease study or individual participants using the Data Matrix. See the TCGA Data Matrix Users Guide for more information. See Pre-built Biotab Files for an alternative to the Data Matrix.
The following block of text is a example listing of biotab files as returned in a Data Matrix archive:
Clinical/BCR/clinical_patient_all_COAD.txt as it appears in Excel:
This example shows participant barcodes and the associated values of clinical parameters. Sorting and filtering on columns containing clinical values of interest allows the user to select a list of patient barcodes meeting desired conditions. The list of barcodes can then be used to identify relevant aliquot barcodes and mapped to assay data.
Pre-built Biotab Files
Biotab files are open access, tab-delimited files submitted to the DCC as Level 2 archives. To browse biotab files for a given tumor, point a browser to a URL of this form: