Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Skip to end of metadata
Go to start of metadata

A manifest file (MANIFEST.txt) is an archive description text file that lists each file within the archive and its checksum value to ensure data integrity.

Introduction

A manifest file is a text file that lists all the files in an archive along with their MD5 hash values. This information is presented in two tab-delimited columns, with MD5 checksum values in the first column and their corresponding filenames in the second column.

A manifest file is created by a data submission center for each archive submitted to the DCC and is packaged within the archive itself.

Checking data integrity using the manifest file

After downloading a TCGA archive, you can check if the integrity of the archive has been kept after file transfer. To do this, unpackage/expand the .tar.gz archive and use the application md5sum or md5 (depending on your operating system). Unix/Linux and Mac OS have this application built-in; Windows users must install md5sum prior to running the following commands. This application must be run on the expanded archive at the level that MANIFEST.txt exists; the following example calls the expanded archive 'archive_directory'.

Unix/Linux/Windows
> cd archive_directory
> md5sum -c MANIFEST.txt

Running 'md5sum -c' compares the MD5 hash value of each file reported in the manifest file with the value of the file itself. If data integrity of the archive has been kept, all files should be listed in the output with 'OK'.

As the md5 application for Mac OS does not come with a checker, you will have to do the MD5 hash value check yourself. Start by creating a manifest file for your downloaded archive files and then compare it to MANIFEST.txt using the application diff.

Mac OS
> cd archive_directory
> md5 -r * > MANIFEST_check.txt
[remove the line with "MANIFEST.txt" from MANIFEST_check.txt]
> diff MANIFEST.txt MANIFEST_check.txt

If data integrity has been compromised, hash values listed in the two manifest files will differ and the files that are faulty will be listed in the output.

Creating the manifest file

This section is relevant to data submission centers only

A manifest file is created within the same directory that contains data files before they are packaged into a TCGA archive. This directory is identified as 'archive_directory' for the instructions below.

There are two components to creating a manifest file:

  1. The manifest file (MANIFEST.txt) must be inside the same directory ('archive_directory') as the data files
  2. The manifest file is created using the application md5sum or md5 (depending on your operating system). Unix/Linux and Mac OS have this application built-in; Windows users must install md5sum prior to running the following commands.

To execute these components, run the following commands in your terminal shell/command prompt:

Unix/Linux/Windows
> cd archive_directory
> md5sum * > MANIFEST.txt
Mac OS
> cd archive_directory
> md5 -r * > MANIFEST.txt

Here is an example of a MANIFEST.txt file

Level 1 MANIFEST.txt:
b6b9b1d7f2f4c5e06bdcd9795a6f2ec0  DESCRIPTION.txt
2357201915bdfa6f23f69877ff811f43  jhu-usc.edu_BRCA.HumanMethylation27.2.lvl-1.TCGA-A2-A04P-01A-31D-A032-05.txt
c860c2db7101571fc14abe963a20f3d5  jhu-usc.edu_BRCA.HumanMethylation27.2.lvl-1.TCGA-A2-A04Q-01A-21D-A032-05.txt
267d435cc885cc8272bb322b1bfa8389  jhu-usc.edu_BRCA.HumanMethylation27.2.lvl-1.TCGA-A2-A04T-01A-21D-A032-05.txt
539e96349a64787907cbbcb40dec7408  jhu-usc.edu_BRCA.HumanMethylation27.2.lvl-1.TCGA-A2-A04V-01A-21D-A032-05.txt

After submitting your archive to the DCC, the archive description file README_DCC.txt will be added to your archive and appended to your manifest file. So the example above would become something like this

Level 1 MANIFEST.txt:
b6b9b1d7f2f4c5e06bdcd9795a6f2ec0  DESCRIPTION.txt
c4434118789cc0b98a5c3ab2723d6879  README_DCC.txt
2357201915bdfa6f23f69877ff811f43  jhu-usc.edu_BRCA.HumanMethylation27.2.lvl-1.TCGA-A2-A04P-01A-31D-A032-05.txt
c860c2db7101571fc14abe963a20f3d5  jhu-usc.edu_BRCA.HumanMethylation27.2.lvl-1.TCGA-A2-A04Q-01A-21D-A032-05.txt
267d435cc885cc8272bb322b1bfa8389  jhu-usc.edu_BRCA.HumanMethylation27.2.lvl-1.TCGA-A2-A04T-01A-21D-A032-05.txt
539e96349a64787907cbbcb40dec7408  jhu-usc.edu_BRCA.HumanMethylation27.2.lvl-1.TCGA-A2-A04V-01A-21D-A032-05.txt

Archive revisions

In the case that you are preparing a revision archive, transfer/retrieve the MANIFEST.txt from the latest revision of the same archive (that is available through the DCC) and place it in your directory of revision data files.

If the intent of your revision archive is to remove data files from the latest available archive, simply remove the lines that list the corresponding filename(s) and MD5 hash value(s) from the manifest file.

If a data file has been updated from its latest available version, recalculate and replace the MD5 hash value for the file in the manifest file. For example:

Unix/Linux/Windows
> md5sum jhu-usc.edu_BRCA.HumanMethylation27.2.lvl-1.TCGA-A2-A04Q-01A-21D-A032-05.txt

Output:
2357201915bdfa6f23f69877ff811f43
Mac OS
> md5(jhu-usc.edu_BRCA.HumanMethylation27.2.lvl-1.TCGA-A2-A04Q-01A-21D-A032-05.txt)

Output:
2357201915bdfa6f23f69877ff811f43

For additions, calculate the MD5 hash values for the new data file(s) using the method described above and append the tab-delimited filename(s) and hash value(s) to the manifest file.

You can submit data file updates, additions and removals all within one revision archive.

c6beaaa5183e7c6ff7b8465ecc3d6dc3 CHANGES_DCC.txt

  • No labels