Skip Navigation
National Cancer Institute U.S. National Institutes of Health www.cancer.gov
NCI Wiki New Account Help Tips
Skip to end of metadata
Go to start of metadata

A Mutation Annotation Format (MAF) is a tab-delimited file containing somatic and/or germline mutation annotations. MAF files containing any germline mutation annotations are kept in the controlled access portion of the Data Portal, MAF files containing only somatic mutations are kept in the open access portion of the Data Portal. MAF files are considered Level 2 files.

MAF Background

Mutations are discovered by aligning DNA sequences derived from tumor samples to sequences derived from normal samples and a reference sequence.  A MAF file identifies, for each sample, the discovered putative or validated mutations and categorizes those mutations (SNP, deletion, or insertion) as somatic (originating in the tissue) or germline (originating from the germline) as well as the annotation for those mutations.

This format is not to be confused with the UCSC Multiple Alignment Format Exit Disclaimer logo MAF).

MAF File Content and Use

As with trace ID-to-sample relationship files, mutation annotation format (MAF) files contain aliquot UUIDs and associated metadata. Those UUIDs enable researchers to associate sample IDs with assay results.

To create a MAF file, GSCs compare a participant's normal chromosomal sequence with the tumor chromosomal sequence and a template reference sequence. Any abnormal differences between the three sequences are captured in the mutation file.

GSCs transfer mutation annotation data to the DCC in two types of files: those that only contain somatic mutations (frequently having the extension somatic.maf) and those that contain both somatic and germline mutations (frequently having the extension protected.maf).  A "protected.maf" file is a super-set of all mutations detected for a given disease by a given GSC (and is available in the controlled access part of the Data Portal).  Frequently an accompanying "somatic.maf" file is submitted for a given disease by the GSC; it contains the somatic mutation subset of the partner "protected.maf" file and is available in the open access part of the Data Portal.

A MAF file identifies, for each sample, the discovered putative or validated mutations and categorizes those mutations (SNP, deletion, or insertion) as somatic (originating in the tissue) or germline (originating from the germline). These can be subcategorized as follows:

Somatic mutations:

  • Missense and nonsense
  • Splice site, defined as SNP within 2 bp of the splice junction
  • Silent mutations
  • Indels that overlap the coding region or splice site of a gene or the targeted region of a genetic element of interest.
  • Frameshift mutations
  • Mutations in regulatory regions

SNPs:

  • Any germline SNP with validation status "unknown" is included.
  • SNPs already validated in dbSNP are not included since they are unlikely to be involved in cancer.

The Mutation Annotation Format (MAF) Specification provides a current and in-depth description of MAF File Validation and Format.

Labels
  • None