NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This article presents a simple workaround which allows you to break down your data into smaller, more manageable chunks that can be individually uploaded without violating the 2 GB limit.

Overview

Your experiment data dataset consists of an IDF metadata file and its corresponding SDRF metadata file, which, in turn, is associated with one or more raw and derived array data files. (In this tutorial, the array files we use are in the Agilent TXT (raw) and TSV (derived) formats; the file formats for your data may differ.
We can break down this collection of files into more manageable ) Depending on the size of your array, the combined size of these files may exceed several gigabytes, even after they are compressed into the ZIP archive format which is required for uploading to caArray. Since the maximum size of a ZIP file that can be uploaded is 2 GB, any dataset which exceeds this size limit must be broken down into smaller chunks, each of which contains a subset of the original data.

The general procedure for breaking down the dataset is as follows:

consists of an individual IDF file and all its associated SDRF, TXT, and TSV files. Each of these chunks then can be packaged into a separate ZIP archive, then uploaded, validated, and imported individually.

Prerequisites

This tutorial assumes that you have past experience and basic familiarity with uploading data into caArray. Specifically, it assumes that you have already created an experiment for your data, uploaded the corresponding array design, and associated the experiment with that design. In case you lack a basic background on uploading caArray data, please refer to the official caArray User's Guide on the NCI wiki at https://wiki.nci.nih.gov/x/LBo9Ag.

...

In preparing your data for upload, the first step is to find all the files associated with a given IDF file. To so, open any of the IDF files from your experiment in Microsoft Excel or another application suited for viewing tab-limited data. The partial screenshot below shows the first of twelve IDF files from our example experiment as viewed in Excel.


Image Modified

The field 'SDRF files' towards the bottom of your IDF file displays the name of the SDRF file that is associated with the IDF.

...