Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Page tree
Skip to end of metadata
Go to start of metadata


A Message-Digest algorithm 5 (MD5) is a computational method used to calculate the hash value (32-digit hexadecimal) of a file. An MD5 file is useful for ensuring data integrity. Archives and files transferred within the TCGA Network are sent with corresponding MD5 files that will match the MD5 calculated for it post-transfer if no data corruption has been introduced.

Identifying MD5 Hash Values

According to best practice, users should ensure that each file downloaded from TCGA has not been corrupted during file transfer. This is especially important for very large archives. MD5 hash files are available to easily confirm the integrity of archived files. A TCGA archive is always accompanied by a corresponding MD5 hash file. Archive and MD5 hash file names differ only in their extensions, as in the following example.

Archive

MD5 file

broad.mit.edu_GBM.HT_HG-U133A.1.0.0.tar.gz

broad.mit.edu_GBM.HT_HG-U133A.1.0.0.tar.gz.md5

MD5 hash values are also available for each file contained in an archive. These values are stored in a manifest file, also present within the archive.

Using MD5 Hash Values

To confirm archive integrity follow these steps (note that the exact syntax varies depending upon what md5 software is installed):

  1. Each time you download a TCGA file, download its corresponding MD5 hash file from the same directory.
  2. Use the program md5sum (for Unix and Mac OSX) or md5sums (for Windows) on the MD5 hash file to verify that the md5 hashes match:
Unix or Mac OSX
$ md5sum -c broad.mit.edu_GBM.Genome_Wide_SNP_6.1.5.0.tar.gz.md5
broad.mit.edu_GBM.Genome_Wide_SNP_6.1.5.0.tar.gz OK
Windows
C:\> md5sums -u broad.mit.edu_GBM.Genome_Wide_SNP_6.1.5.0.tar.gz.md5

To check the validity of any of the individual files in an archive, the MANIFEST.txt file can be used to verify if the state of the files:

Unix/ Mac OSX/ Windows
$ cd archive_directory
$ md5sum -c MANIFEST.txt

Note that for Windows, the above command assumes that md5sum.exe has been installed.  The md5sums program will not check against a file of existing md5 values.  Also note that the MANIFEST.txt file itself will always fail this check.


The command-line software md5sum or md5sums are implementations of the MD5 algorithm for creating MD5 hashes. Mac and Unix-based distributions often come with md5sum. To download md5sums for windows, see ps-tools.net.

Creating MD5 Hash Values

To create MD5 hash values in a manifest file or an archive, follow the directions in the sections below.

Manifest File

This is an excerpt from Manifest File page

To execute these components, run the following commands in your terminal shell/command prompt:

Unix/Linux/Windows
> cd archive_directory
> md5sum * > MANIFEST.txt
Mac OS
> cd archive_directory
> md5 -r * > MANIFEST.txt

Archive

To execute these components, run the following commands in your terminal shell/command prompt:

Unix/Linux/Windows
> cd archive_directory
> md5sum archive_name.tar.gz > archive_name.tar.gz.md5
Mac OS
> cd archive_directory
> md5 -r archive_name.tar.gz > archive_name.tar.gz.md5
  • No labels