Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Skip to end of metadata
Go to start of metadata

An Investigation Description Format (IDF) file is a tab-delimited file that provides general information about the investigation and experiment.

IDF File Content

An IDF file contains top-level information about the investigation and experiment, protocols used, and ontologies/databases referenced. The file consists of a set of unique tags attached to their corresponding values in a simple tab-delimited text format. The following figure shows a sample IDF file and highlights different experiment attributes included in the file. Information contained in each row is described by its header in the first column and subsequent columns list the values associated with the attribute. Blank lines can be included for legibility and lines beginning with "#" are treated as comments and ignored. More information including a full list of all tags that are supported in IDF can be found at the MAGE Tabulator site Exit Disclaimer logo .

The following figure represents a sample IDF file and highlights the sections that describe different experiment attributes.

example IDF file. See text.

An IDF file includes its name, a brief description of the investigation and/or experiment, the investigator's contact details, bibliographic references, and text descriptions of the protocols used in the investigation. IDF files are row-based. Each row represents the data prescribed by the row header. The following figure provides an example of an IDF file.

Values of IDF data items ("attributes") remain constant throughout a TCGA experiment, with the exception of the following attributes:

  • All attributes relating to Person can change depending on roles
  • Date of Experiment should change with each related data archive submission
  • Public Release Date should change as required
  • Protocols may change (see below)
  • SDRF Files may change

Protocols in an IDF File

The IDF describes the experiment at a high level and also lists details of protocols used in the experiment. The protocol identifier is listed as "Protocol Name" in the IDF and is referenced in the SDRF in the "Protocol REF" column. All relevant protocols should be present in all submissions for a particular platform. The DCC will deposit protocols into online databases (e.g. caArray). A Protocol entry includes:

  1. Protocol Name: The "Protocol Name" is used as an ID for a protocol and is referenced in the SDRF file and online databases. Although protocol names are self-assigned, please follow the prescribed naming scheme below. The names you use should persist unless the protocol changes. The format of the name should be:

    Domain:ProtocolType:Platform:Version
    
    Example:
    broad.mit.edu:hybridization:HG-U133_Plus_2:01
    

    where:

    Domain

    Matches a center's internet domain name

    Protocol Type

    Originates from MGED Ontology Exit Disclaimer logo subclasses of ProtocolType, for example, Experimental, DataTransformation, HigherLevelAnalysis

    Platform

    Matches the Array Design platform

    Version

    Allows for changes or optimizations of protocol parameters. If a protocol is modified, then the version is incremented.

  2. Protocol Type: comes from MGED Ontology or some other controlled vocabulary
  3. Protocol Term Source REF: should be MGED Ontology or another controlled vocabulary source
  4. Protocol Description: should include a brief description of the protocol, or its URL
  5. Protocol Parameters: required for informatics-based protocols; multiple parameters should be separated by semicolons. Parameters should reflect the "Parameter Value [*]" entries in the SDRF. See the MAGE-TAB specification for more details.

Guidelines on Creating IDF Files

  1. IDF files use row headers.
  2. If data is not available, the field should be left blank.
  3. The sources of controlled vocabulary terms (e.g. MGED or NCI ontology) and external database references (for example. caArray Array Design Names) are given using "Term Source REF." The "Term Source REF" row should be given after the term's row or column. "Term Source REF" is used as the suffix to a row header. For example, "Person Roles" is the row header and "Person Roles Term Source REF" is the term source row header and the former precedes the latter.
  4. All values of attributes in an IDF document should remain constant throughout a TCGA experiment with the exception of following fields:
    • All attributes relating to "Person" can change depending on roles
    • "Date of Experiment" should change
    • "Public Release Date" should change. It should reflect the approximate date that the archive containing the IDF file will be transferred.
    • "Protocols" may change
    • "SDRF File" should change

IDF File Validation

Purpose

The validator checks IDF file in the MAGE-TAB archive to ensure required elements and values are present.

Runs On

MAGE-TAB archive IDF files with extension ".idf.txt".

File Format

IDF files are tab-delimited with row headers. The first token on each line of the file represents the row header, and the following tab-separated tokens are the values.

Actions

  • If the IDF is blank then FAIL
  • If any row headers are not in the Allowed Headers list, then FAIL
  • If the 'Protocol Name' header is missing then FAIL
  • If 'Protocol Name' value does not have the valid format <domain>:<protocolType>:<platform>:<version> using regular expression
    ([a-zA-Z0-9\-_.]+)[:]+([a-zA-Z0-9\-_]+)[:]+([a-zA-Z0-9\-_]+)[:]+([0-9]+)
    then FAIL
    • If the <domain> value in the 'Protocol Name' does not match the archive's center domain, then FAIL
    • If the <platform> value in the 'Protocol Name' does not match the archive's platform, then FAIL
  • If the 'Protocol Description' header is missing then FAIL
  • If the first value for any header is "->" (indicating a blank) then FAIL
  • If the 'Term Source Name' header is present, then the number of values for 'Term Source Name', 'Term Source File', and 'Term Source Version' must be the same, otherwise FAIL
  • If any SDRF column header contains a 'Term Source REF' value that is not represented under the IDF "Term Source Name" header then FAIL
  • If there are no FAIL conditions, then PASS

Allowed Headers

Investigation Title
Experimental Design
Experimental Design Term Source REF
Experimental Factor Name
Experimental Factor Type
Experimental Factor Type Term Source REF
Person Last Name
Person First Name
Person Mid Initials
Person Email
Person Phone
Person Fax
Person Address
Person Affiliation
Person Roles
Person Roles Term Source REF
Quality Control Types
Quality Control Types Term Source REF
Replicate Type
Replicate Type Term Source REF
Normalization Type
Normalization Term Source REF
Date of Experiment
Public Release Date
Comment[ArrayExpressSubmissionDate]
PubMed ID
Publication DOI
Publication Author List
Publication Title
Publication Status
Publication Status Term Source REF
Experiment Description
Protocol Name
Protocol Type
Protocol Description
Protocol Parameters
Protocol Hardware
Protocol Software
Protocol Contact
Protocol Term Source REF
SDRF Files
Term Source Name
Term Source File
Term Source Version

  • No labels