NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Your experiment dataset consists of an IDF metadata file and its corresponding SDRF metadata file, which, in turn, is associated with one or more raw and derived array data files. ( In this tutorial, the array files we will use are in the Agilent TXT (raw) and TSV (derived) formats; the file formats for your data may differ.

) Depending on the size of your array, the combined size of these files may exceed several gigabytes, even after they are compressed into the ZIP archive format which is required for uploading to caArray. Since the maximum size of a ZIP file that can be uploaded is 2 GB, any dataset which exceeds this size limit must be broken down into smaller chunks, each of which contains a subset of the original data.

...

  1. Divide the array data files into smaller batches, each of which is no larger than 2 GB in combined size.
  2. Split the original SDRF file into multiple SDRF files, each of which corresponds to a single batch and references only the array data files from that batch.
  3. Create multiple IDF files derived from the original IDF, with each one referencing one of the SDRF files created in the previous step.
  4. Create multiple ZIP archives, each containing a single IDF and its associated SDRF and raw and array data files.
  5. Upload each ZIP archive individually, then validate and import the files from each.

...

In preparing your data for upload, the first step is to find all the files associated with a given IDF file. To so, open any of the IDF files from your experiment in Microsoft Excel or another application suited for viewing tab-limited data. The partial screenshot below shows the first of twelve IDF files from our example experiment as viewed in Excel.


Image Modified

The field 'SDRF files' towards the bottom of your IDF file displays the name of the SDRF file that is associated with the IDF.

...