Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Page tree
Skip to end of metadata
Go to start of metadata

Microarray Gene Expression - Tabular format (MAGE-TAB) is a MIAME-compliant, tab-delimited format used to annotate microarray data.

For more information on MAGE-TAB, visit the FGED Society website Exit Disclaimer logo .

In order to provide a common platform for sharing characterization data within the research community, the Microarray Gene Expression Data (MGED) Society developed the Minimum Information About a Microarray Experiment (MIAME) standard. MIAME describes the data and accompanying metadata that investigators must provide so that the experiment can be reproduced and the results can be interpreted in light of the experimental conditions. MAGE-TAB (MicroArray Gene Expression Tabular) uses simple spreadsheet-based format for representing primary data and associated metadata. MAGE-TAB specification is based on the Microarray and Gene Expression Exit Disclaimer logo Object Model (MAGE-OM Exit Disclaimer logo ). MAGE-TAB specification Exit Disclaimer logo and related publication Exit Disclaimer logo provide more details on the format.

MAGE Experiments in TCGA

MAGE-based documents usually represent an experiment consisting of many assays. That experiment usually represents a complete study. In the case of TCGA, an experiment for a particular center is composed of all the assays of a particular platform for all the samples of a particular tumor type. Since TCGA mandates that data be made available as soon as possible, centers will submit data as soon as possible, so your set of MAGE-TAB documents are required to be updated often for an experiment.

All TCGA characterization data will be modeled using the MAGE-OM and the MAGE-TAB specification will be used to represent the MAGE-OM. One of the goals of modeling and formatting the data is that MAGE-TAB documents can be submitted to external databases (e.g., caArray, ArrayExpress, GEO). Submission of the data is a requirement for its publication and allows querying of the data.


To capture experiment details and the relationships between related data files (i.e. data files from different stages of sample data as protocols are continuously applied to it) TCGA uses the MAGE-TAB standard. MAGE-TAB files are tab-delmited text files that model data in the form of columns and rows and is able to capture complex experimental relationships such as an entire study using multiple assays.

MAGE-TAB format uses three different types of files to capture information about an experiment. Click on each file type to learn more about it.

File Type

File Extension




Mandatory File?

Investigation Description Format (IDF)



Provides general information about the investigation, including its name, a brief description, the investigator‘s contact details, bibliographic references, and free text descriptions of the protocols used in the investigation.

page icon that links to example file


Sample and Data Relationship Format (SDRF)



Describes the relationships between samples, arrays, data, and other objects used or produced in the investigation, and providing all MIAME information that is not provided elsewhere. In TCGA SDRF files, a row represents an analyzed element (often an aliquot) in its most basic electronic form (i.e. raw data file) and the production of higher-level data files (Level 2 and 3) as protocols (e.g. normalization) are applied to the file and its derivatives. These protocols correspond to those listed in the IDF.

page icon that links to example file


Array Design Format (ADF)



Defines each array type used. An ADF file describes the design of an array, e.g., what sequence is located at each position on an array and what the annotation of this sequence is. An ADF may exist in the MAGE-TAB archive or through the Data Portal on the Platform Design page.

page icon that links to example file













The following figure depicts the association among different files in a MAGE-TAB archive. The "raw data files" exist in Data Archives.












Gliffy Macro Error

Cannot find a diagram with these parameters:

  • Name: relationships
  • Version: 5












Data Files and Data Matrices

There may be many different types of data documents including ASCII or binary files (e.g. CEL files), typically in their native formats. A full list of supported formats can be found in the Tab2MAGE data file documentation Exit Disclaimer logo .

Preferably data should be provided in a specially defined tab-delimited format termed a “data matrix”. MAGE-TAB Data Matrix is a simplified format which allows data columns to be mapped to rows in the SDRF file. The first header line of a Data Matrix file describes this mapping, and the second lists the quantitation types for each column. The first column is used to map the data rows to identifiers from the array design used. MAGE-TAB overview Exit Disclaimer logo and specification Exit Disclaimer logo provide more information and examples.

General Notes on Formatting MAGE-TAB Documents

  1. File names of center-specific documents should reflect the archive the file is contained in. Also refer to TCGA Archive Naming Convention.
  2. Dates should be formatted as year-month-day (e.g. 2007-01-18).
  3. Please be careful when editing tab-delimited documents in Microsoft Excel or other spreadsheet applications. Those applications tend to automatically reformat data.
  4. Note that the IDF, SDRF, ADF and "data matrix" files should be in plain, tab-delimited text format.
  5. The MAGE-TAB specification contains many non-required headers (e.g. "Factor Value", "Characteristics", "Protocol Parameters", "Protocol Hardware", "Protocol Software", "Comment", "Normalization Type", "Replicate Type", "Quality Control Types", "Experimental Factor Type", and "Experimental Design"). Please consider adding such values for those headers if the values pertain to your experiment.
  6. Please be verbose in your README.txt file and "Experiment Description" MAGE-TAB header.

MAGE-TAB Archive Validation

The DCC uses the ArrayExpress MAGE-TAB scripts in the Tab2MAGE Exit Disclaimer logo software package to create MAGE-ML from MAGE-TAB formatted documents. Therefore, for submitted MAGE-TAB to pass the DCC QC process, it must be successfully processed by the MAGE-TAB scripts. Please keep that in mind before transferring data to the DCC. You may want to run the Tab2MAGE software on your data before transferring it to make sure that MAGE-ML can be created.

Do not reuse example files by adding your IDs. For instance, do not download another center's archive and reuse their IDF or SDRF by adding your own values. Instead, prepare your files according to the design of your center's experiment and the MAGE-TAB specification. It is advised that you run the MAGE-TAB experiment checker Exit Disclaimer logo on your MAGE-TAB documents and also visualize Exit Disclaimer logo the result to see if it makes sense.

Sample MAGE-TAB files

For a real example of IDF and SDRF files, download the MAGE-TAB documents prepared by Memorial Sloan Kettering.

Data Type Groups that currently use MAGE-TAB












Data Type Group (links to specific validations)

Modeled using MAGE-TAB?

aCGH Based Copy Number


Array Based Expression


DNA Methylation


Protein Arrays


RNASeq and miRNASeq Expression


SNP-based SNP, Copy Number, LOH


Low-pass sequencing Based Copy Number 

Yes, but in the process of being implemented


Yes, but not implemented yet

DNA Sequencing (GSC)

To be specified and implemented























Specific Validations

Validations that are MAGE-TAB specific include

ADF Validations

Note that no ADF validations exist

Standard Validations

MAGE-TAB archives undergo standard validation sets shown in the Standard Archive Validation chart for GCC archives:












  • No labels