IDF File Content
An IDF file contains top-level information about the investigation and experiment, protocols used, and ontologies/databases referenced. The file consists of a set of unique tags attached to their corresponding values in a simple tab-delimited text format. The following figure shows a sample IDF file and highlights different experiment attributes included in the file. Information contained in each row is described by its header in the first column and subsequent columns list the values associated with the attribute. Blank lines can be included for legibility and lines beginning with "#" are treated as comments and ignored. More information including a full list of all tags that are supported in IDF can be found at the MAGE Tabulator site .
The following figure represents a sample IDF file and highlights the sections that describe different experiment attributes.
An IDF file includes its name, a brief description of the investigation and/or experiment, the investigator's contact details, bibliographic references, and text descriptions of the protocols used in the investigation. IDF files are row-based. Each row represents the data prescribed by the row header. The following figure provides an example of an IDF file.
Values of IDF data items ("attributes") remain constant throughout a TCGA experiment, with the exception of the following attributes:
- All attributes relating to Person can change depending on roles
- Date of Experiment should change with each related data archive submission
- Public Release Date should change as required
- Protocols may change (see below)
- SDRF Files may change
Protocols in an IDF File
The IDF describes the experiment at a high level and also lists details of protocols used in the experiment. The protocol identifier is listed as "Protocol Name" in the IDF and is referenced in the SDRF in the "Protocol REF" column. All relevant protocols should be present in all submissions for a particular platform. The DCC will deposit protocols into online databases (e.g. caArray). A Protocol entry includes:
Protocol Name: The "Protocol Name" is used as an ID for a protocol and is referenced in the SDRF file and online databases. Although protocol names are self-assigned, please follow the prescribed naming scheme below. The names you use should persist unless the protocol changes. The format of the name should be:
Matches a center's internet domain name
Originates from MGED Ontology subclasses of ProtocolType, for example, Experimental, DataTransformation, HigherLevelAnalysis
Matches the Array Design platform
Allows for changes or optimizations of protocol parameters. If a protocol is modified, then the version is incremented.
- Protocol Type: comes from MGED Ontology or some other controlled vocabulary
- Protocol Term Source REF: should be MGED Ontology or another controlled vocabulary source
- Protocol Description: should include a brief description of the protocol, or its URL
- Protocol Parameters: required for informatics-based protocols; multiple parameters should be separated by semicolons. Parameters should reflect the "Parameter Value [*]" entries in the SDRF. See the MAGE-TAB specification for more details.
Guidelines on Creating IDF Files
- IDF files use row headers.
- If data is not available, the field should be left blank.
- The sources of controlled vocabulary terms (e.g. MGED or NCI ontology) and external database references (for example. caArray Array Design Names) are given using "Term Source REF." The "Term Source REF" row should be given after the term's row or column. "Term Source REF" is used as the suffix to a row header. For example, "Person Roles" is the row header and "Person Roles Term Source REF" is the term source row header and the former precedes the latter.
- All values of attributes in an IDF document should remain constant throughout a TCGA experiment with the exception of following fields:
- All attributes relating to "Person" can change depending on roles
- "Date of Experiment" should change
- "Public Release Date" should change. It should reflect the approximate date that the archive containing the IDF file will be transferred.
- "Protocols" may change
- "SDRF File" should change
IDF File Validation
The validator checks IDF file in the MAGE-TAB archive to ensure required elements and values are present.
MAGE-TAB archive IDF files with extension ".idf.txt".
IDF files are tab-delimited with row headers. The first token on each line of the file represents the row header, and the following tab-separated tokens are the values.
- If the IDF is blank then FAIL
- If any row headers are not in the Allowed Headers list, then FAIL
- If the 'Protocol Name' header is missing then FAIL
- If 'Protocol Name' value does not have the valid format <domain>:<protocolType>:<platform>:<version> using regular expression
- If the <domain> value in the 'Protocol Name' does not match the archive's center domain, then FAIL
- If the <platform> value in the 'Protocol Name' does not match the archive's platform, then FAIL
- If the 'Protocol Description' header is missing then FAIL
- If the first value for any header is "->" (indicating a blank) then FAIL
- If the 'Term Source Name' header is present, then the number of values for 'Term Source Name', 'Term Source File', and 'Term Source Version' must be the same, otherwise FAIL
- If any SDRF column header contains a 'Term Source REF' value that is not represented under the IDF "Term Source Name" header then FAIL
- If there are no FAIL conditions, then PASS
Experimental Design Term Source REF
Experimental Factor Name
Experimental Factor Type
Experimental Factor Type Term Source REF
Person Last Name
Person First Name
Person Mid Initials
Person Roles Term Source REF
Quality Control Types
Quality Control Types Term Source REF
Replicate Type Term Source REF
Normalization Term Source REF
Date of Experiment
Public Release Date
Publication Author List
Publication Status Term Source REF
Protocol Term Source REF
Term Source Name
Term Source File
Term Source Version