NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Scrollbar
iconsfalse

Question: What are

...

MAGE-TAB Files?

Topic: caArray Usage

Release: caArray 2.X

Date entered: 02/12/2009

Topics on this page include the following:

Table of Contents
minLevel2

Answer

MAGE-TAB (MicroArray Gene Expression Tabular) format files files, considered the MAGE-TAB data set, refer to simple tab-delimited, spreadsheet-based files, which can be used for annotating and communicating microarray data in a MIAME compliant fashion. MAGE-TAB specification is based on the Microarray and Gene Expression Object Model (MAGE-OM). The MAGE-TAB specification document and related publications provide more details on the format.

The MAGE-TAB format files (see Sourceforge overview) can be divided into two groups: array data files and descriptive files.

There are two types of array data: the raw array data and the derived array data. A variety of raw data files, produced by several different scanner makes and models, are supported by the caArray MAGE-TAB parser. However, these raw data may not be in MAGE-TAB format. The derived array data refers to either the normalized array data, or a data file with data combined from more than one hybridization or scan. The caArray MAGE-TAB parser supports Affymetrix .CHP format for the derived data (which is not in MAGE-TAB format). For the rest, derived data needs to be reformatted in a MAGE-TAB Data Matrix according to the table below.

specification defines four different types of files to fully describe a microarray investigation. The following table summarizes the definition for each of these file types.

File Type

File Extension

Description

Processed by caArray

Investigation Description Format (IDF)

.idf

A MAGE-TAB tab-delimited file providing general information about the investigation, including its name, a brief description, the investigator's contact details, bibliographic references, ontologies/databases referenced and free text descriptions of the protocols used in the investigation.

Warning
titleWarning

In a tab-separated line in the IDF, there must be no empty columns, i.e. two tabs with nothing between them. Empty columns will result in import failure.

Yes. Parsed, validated before import

Sample and Data Relationship Format (SDRF)

.sdrf

A MAGE-TAB tab-delimited file (or files) describing the relationships between samples, arrays, data, and other objects used or produced in the investigation, and providing all MIAME information that is not provided elsewhere. An SDRF contains all of the information linking the samples to your data files, (for example, Affymetrix CEL, Affymetrix CHP). Each row in the table represents a hybridization channel, and the columns represent the steps of the experiment, read from left to right.

Yes. Parsed, validated before import

Array Design Format (ADF)

.adf

A tab-delimited file with standardized column names describing the design of an array, e.g., what sequence is located at each position on an array and what the annotation of this sequence is. MAGE-TAB ADF files are not required in a caArray experiment.

Can be uploaded, but are not parsed. Vendor-specific Array Design Files are recommended, if available.

Raw and processed data files

.txt or other

Raw data files: ASCII or binary files, typically in their native formats. Processed (derived) data: tab-delimited text format; can be MAGE-TAB data matrix files. Processed data can be binary as well, such as Affymetrix MASS generated CHP files.

Of data matrix files, only copy number data matrix files are parsed.

To load an experiment into caArray using the MAGE-TAB format, you must create the first two file types shown in the table above, IDF and SDRF. See Building MAGE-TAB Formatted Files. You also must make sure that the corresponding Array Design (the third file type) for your experiment has been loaded into caArray. The fourth file type, raw and/or derived (processed) data files are created by the instruments reading the array results.

The files described in the table make up the MAGE-TAB data set. They can be categorized into two groups: descriptive files and array data files. Some of the files are in MAGE-TAB format (see the Sourceforge overview) and others are supported as part of the MAGE-TAB data set. See the following figure and its description below.

illustration showing components of MAGE-TAB data set in caArray. See TextImage Added

The descriptive files include IDF, SDRF and Array Design Files. Note that the MAGE-TAB Descriptive files can be further divided into 3 subgroups: Array Design File (ADF), Investigation Design File (IDF) and Sample Data Relationship File (SDRF). For more information about annotation files: IDF/SDRF, refer to caArray 008 - How do I use the Annotation Tab versus MAGE-TAB Annotation Files in caArray?. The table summarizes the definition of each MAGE-TAB format file. It is necessary to mention that MAGE-TAB format Array Design File (designated as ADF) is not mandatory in caArray, since array design files for the common arrays are usually available from their respective array providers. If an array design file (which may not be MAGE-TAB format) is available from its array provider, it should be chosen over an ADF file. Furthermore, MAGE-TAB ADF is files are not parsed by caArray. The third party array Array design files, whether native or ADF, are uploaded via "using the Manage Array Design" interface under caArray's "Curation" tab. An ADF fileCuration on the caArray left sidebar. The MAGE-TAB file types, IDF, SDRF and data matrix files, on the other hand, is are uploaded together with the rest of MAGE-TAB files. ADF file will not be validated or parsed by caArray. It will be imported directly into caArray according to the table that follows.

MAGE-TAB Formatted Files

Abbreviation

File Type

Comments

caArray compatible?

Processed by caArray?

IDF

Investigation Design File

Provides an overview of the experiment, including the experimental variables (factors) used, protocols, quality control strategy, publication information and contact details

Yes

Yes: Parsed, Validated before import

SDRF

Sample Data Relationship File

Describes relationships between samples, arrays, data files, protocols, factor values etc. It is a table in which each hybridization channel is represented by a row, and columns represent the steps of the experiment. The ordering of these columns is important, and reads left-to-right in chronological order.

Yes

Yes: Parsed, Validated before import

ADF

Array Design File

Provides the array-level annotation for the experiment. It relates the row-level identifiers in the data files to biological sequence annotation

Yes

No. Directly Import

TXT or other

Data Matrix

Contains processed array data files in tab-delimited text format. Rows may represent genes/ exons/ genomic locations. Columns represent samples or experimental conditions.

Yes

No. Directly Import

What are MAGE-TAB Files?

The term of "MAGE-TAB Files" (refer to caArray 002 - How do I upload MicroArray Gene Expression Data into caArray?, step 2 for an example), has been used to refer not only MAGE-TAB formatted files as summarized in the table, but also files that are supported by MAGE-TAB parser mentioned in the last section. To be more specific, MAGE-TAB files also include the third party's raw array data files, derived data files and array design files from array providers, as shown in the illustration.

In summary, MAGE-TAB files refer to each other. Together they represent the complete experiment.

Diagram Identifying Content of MAGE-TAB Files
Diagram Identifying Content of MAGE-TAB FilesImage Removed

According to the figure above, MAGE-TAB files can be divided into two categories: array data files, which are in ASCII or binary format, and descriptive files, which contain information about array design and investigation data. These two categories can be further subdivided as follows:

Array data files can either be raw (i.e., unprocessed) data files whose format is specified by a third-party vendor, or they can be derived (i.e., normalized) data files. The latter are either generic data in MAGE-TAB format, or Affymetrix .CHP files that are not in MAGE-TAB format but can still be parsed by caArray.

Descriptive files can contain information about investigation data or array design. Investigation data files conform to one of two specifications: Investigation Description File (IDF) or Sample and Data Relationship File (SDRF). Array design files conform to one of two specifications: MAGE-TAB formatted design files with a .ADF file extension, or non-MAGE-TAB formatted design files from third-party vendors that can still be parsed by caArray.

Building MAGE-TAB Formatted files

on the Manage Data page.

Array data files consist of raw array data and/or derived array data. A variety of raw data files, produced by several different scanner makes and models, are supported by the caArray MAGE-TAB parser even though these raw data may not be in MAGE-TAB format. Derived array data refers to either normalized array data, or a data file with data combined from more than one hybridization or scan. The caArray MAGE-TAB parser supports some of these non-MAGE-TAB files, as well. Derived array data files in MAGE-TAB format are called data matrix files. The only data matrix files that caArray parses are copy number data matrix files. For more information, see File Types in caArray.

See also About File Types in caArray and MAGE-TAB in caArray--Overview in the caArray User's Guide.

MAGE-TAB and Data Files Relationships

MAGE-TAB files imported into caARRAY map to data files also imported into the same caARRAY experiment. The following figure illustrates the relationship.

diagram illustrating the relationships of MAGE-TAB files to data files in caArrayImage Added

Building MAGE-TAB Formatted files

Because of the varied nature of every caArray experiment and its corresponding data, it is impossible to outline the exact steps for creating a caArray-compatible MAGE-TAB file. There are strict guidelines, however, for characteristics of MAGE-TAB files that meet the criteria. Refer to Appendix A - MAGE-TAB in caArray in the caArray User's Guide for specific details regarding caArray-compatible MAGE-TAB files. To provide context, see the MAGE-TAB Specification document.

To get started, you can The MAGE-TAB specification can be found at: MGED homepage.
To get started, you may generate a MAGE-TAB template file from EMBL-EBI's MAGE TAB site, or create your own IDF and SDRF files based on the Sourceforge MAGE-TAB documentation.

It is essential that you review the details in Appendix A - MAGE-TAB training and demos for the caArray users are currently under the development. We will add the links here once they become availablein caArray to ensure that your MAGE-TAB files meet the specifications for compatibility with caArray.

For more information about When when to use MAGE-TAB annotation files in caArray, refer to caArray 008 - How do Should I use the Annotation Annotations Tab versus or MAGE-TAB Annotation annotation Files in caArray?. For more information on How how to upload MAGE-TAB files, refer to caArray 002 - How do I upload MicroArray Gene Expression Data into caArray? Importing MAGE_TAB Files in the caArray User's Guide.

Have a comment?

Please leave your comment in the caArray End User Forum.

...