NIH | National Cancer Institute | NCI Wiki  

Question: Which file can be parsed into caArray? What is the benefit of file parsing?

Topic: caArray Usage

Release: caArray 2.2.0 and above

Date entered: 03/30/2009


Files Recognized by caArray

caArray has the ability to upload the array design file or experiment data from many array providers, even if it doesn't have a parser available yet. Those files will be imported into caArray without being validated or parsed. Even if a file is not parsed, the user will still be able to download the file (through the user interface as well as through the programmatic API), and will be able to associate the file to samples, extracts, and hybridizations. This feature allows data to be shared and help the system identify which new parsers are need developed in the future. For more information on how files are processed by caArray, review caArray 017 - Meaning of the caArray Status of Importing - Imported versus Imported Not Parsed.

The following table, File types from caArray, (from Chapter 7 in the caArray User Guide) summarizes the file types that caArray currently supports with full validation and parsing as well as those that can be imported without validation and parsing. The user's guide also summarizes the array design file types.

Files That Can Be Imported into caArray

File Types

Imported after validation and parsing

Imported without validation and parsing

Raw/processed data files;
provide numerical values of array data

  • Affymetrix CEL, CHP, CNCHP
  • GenePix GPR*
  • Illumina CSV, Sample Probe Profile TXT, Genotyping processed data matrix TXT, Raw TXT
  • Agilent Raw TXT
  • Nimblegen Normalized Pair Report TXT

    *For more information about GenePix GPR and Illumina CSV files, see MAGE-TAB SDRF Validation Rules, items 1 and 2.
  • Affymetrix DAT, RPT, TXT, and EXP
  • Agilent TSV, derived TXT
  • Illumina IDAT, TXT
  • ImaGene TIF, TXT
  • Nimblegen GFF, Raw or Derived TXT
  • ScanArray CSV
    Note: caArray may have new parsers available for data files in the system that are already imported but not parsed. To learn about retrofitting those files, see #Retrofitting Data Files.

Array Design files;
provide the design of an array.

For information about array design file types, see the table in Managing Array Designs.


MAGE-TAB files;
used to annotate experiments automatically

  • MAGE-TAB SDRF (Sample and Data Relationship Format)
  • MAGE-TAB IDF (Investigation Description Format) only, no referenced SDRFs
  • MAGE-TAB Copy Number Data Matrix
    Note: Only one IDF is allowed per import, since the import is in the context of a single experiment.
  • MAGE-TAB Data Matrix (not copy number)

Supplemental Files

These cannot be validated nor imported. Files of unknown file type or simply reference files fall under this category. For more information, see #Supplemental Files.



Image files cannot be validated or imported successfully into caArray 2.4. For more information, see Image File Importing Issues in caArray as well as Appendix B - Importing Data Files.

Benefit of File Parsing

For the data that are parsed into caArray, an analytical service (like geWorkbench) can pull the data out using the programmatic API and perform analysis on it or plot graphs from it etc. Another example is web Genome, a caArray client, which pulls parsed data from caArray experiments and plots log ratio values against the chromosome location. With parsed data, a client can ask for quantitative types (columns) of data of interest, instead of having to retrieve the entire contents of the data file.

Have a comment?

Please leave your comment in the caArray End User Forum.