A manifest file is a text file that lists all the files in an archive along with their MD5 hash values. This information is presented in two tab-delimited columns, with MD5 checksum values in the first column and their corresponding filenames in the second column.
A manifest file is created by a data submission center for each archive submitted to the DCC and is packaged within the archive itself.
Checking data integrity using the manifest file
After downloading a TCGA archive, you can check if the integrity of the archive has been kept after file transfer. To do this, unpackage/expand the .tar.gz archive and use the application md5sum or md5 (depending on your operating system). Unix/Linux and Mac OS have this application built-in; Windows users must install md5sum prior to running the following commands. This application must be run on the expanded archive at the level that
MANIFEST.txt exists; the following example calls the expanded archive 'archive_directory'.
Running 'md5sum -c' compares the MD5 hash value of each file reported in the manifest file with the value of the file itself. If data integrity of the archive has been kept, all files should be listed in the output with 'OK'.
As the md5 application for Mac OS does not come with a checker, you will have to do the MD5 hash value check yourself. Start by creating a manifest file for your downloaded archive files and then compare it to
MANIFEST.txt using the application diff.
If data integrity has been compromised, hash values listed in the two manifest files will differ and the files that are faulty will be listed in the output.
Creating the manifest file
This section is relevant to data submission centers only
A manifest file is created within the same directory that contains data files before they are packaged into a TCGA archive. This directory is identified as 'archive_directory' for the instructions below.
There are two components to creating a manifest file:
- The manifest file (
MANIFEST.txt) must be inside the same directory ('archive_directory') as the data files
- The manifest file is created using the application md5sum or md5 (depending on your operating system). Unix/Linux and Mac OS have this application built-in; Windows users must install md5sum prior to running the following commands.
To execute these components, run the following commands in your terminal shell/command prompt:
Here is an example of a MANIFEST.txt file
After submitting your archive to the DCC, the archive description file
README_DCC.txt will be added to your archive and appended to your manifest file. So the example above would become something like this
In the case that you are preparing a revision archive, transfer/retrieve the
MANIFEST.txt from the latest revision of the same archive (that is available through the DCC) and place it in your directory of revision data files.
If the intent of your revision archive is to remove data files from the latest available archive, simply remove the lines that list the corresponding filename(s) and MD5 hash value(s) from the manifest file.
If a data file has been updated from its latest available version, recalculate and replace the MD5 hash value for the file in the manifest file. For example:
For additions, calculate the MD5 hash values for the new data file(s) using the method described above and append the tab-delimited filename(s) and hash value(s) to the manifest file.
You can submit data file updates, additions and removals all within one revision archive.