Skip Navigation
NIH | National Cancer Institute | NCI Wiki   New Account Help Tips
Page tree
Skip to end of metadata
Go to start of metadata
Document Information

Specification for Wiggle Format
Version 1.0
February 21, 2011

Contents

Wiggle (WIG) file specification

The specification presented here identifies the DCC-specific requirements for WIG file uploads. See the UCSC page for more explanatory detail.

Note: Much of the UCSC specification deals with presenting wiggle data to a genome browser. The TCGA application of the wiggle format is currently used for the transfer of data, not necessarily destined for a browser. Browser-related tags are passed through; the DCC does not perform parameter checks on the values of those tags.

Specification

General structure

Overall, a wiggle file has the following structure, with the components described below:

<track line>
<declaration line>
<data line>
<data line>
... (further data lines)
<declaration line>
<data line>
<data line>
... (further data lines)
  • A minimally valid wiggle file has at least one complete track specified:
<track line>
<declaration line>
<data line>
  • The DCC spec allows comment lines; these are ignored by DCC applications:
# this is a comment
<track line>
<declaration line>
...
  • Each declaration line should be followed by at least one data line.
  • Subsequent declaration lines can refer to the latest track line.
  • The first non-comment line should be a track line.

Track line

A track line begins with the identifier track and followed by attribute/value pairs separated by spaces, specified as:

track <attribute_1>=<value_1> <attribute2>=<value2> ... <attribute_n>=<value_n>

There is no space between tag, =, and attribute. White space is allowed in attributes if the value is quoted (with either single or double quotes). Other whitespace is ignored.

DCC-specified attributes:

attribute

value desc

admissible values or regexp match

default

type

wiggle_type

"wiggle_0"

"wiggle_0" (must be present)

name

track_label

[a-zA-Z0 ]{,15} spaces allowed with quotes

"User Track"

description

center_label

.{,60} spaces allowed with quotes

This value will contain the wig file name as it appears in the archive

Other tag-attribute pairs may be present.

Declaration lines

A single line, beginning with one of the identifiers variableStep or fixedStep, followed by attribute/value pairs separated by spaces:

variableStep <attribute1>=<value1> ... <attribute_n_>=<value_n_>

or

fixedStep <attribute1>=<value1> ... <attribute_n_>=<value_n_>

Valid attributes:

variableStep attribute

admissible values or regexp match

use

chrom

UCSC chromosome sequence name

required

span

an integer (zero is unallowed)

optional, default is 1

fixedStep attribute

admissible values of regexp match

use

chrom

UCSC chromosome sequence name

required

start

an integer (zero is unallowed)

required

step

an integer (zero is unallowed)

required

span

an integer (zero is unallowed)

optional, default is 1

Data lines

variableStep data : Lines following a variableStep declaration line consist of an integer chromosomal coordinate followed by an integer or floating point data value.

fixedStep data : Lines following a fixedStep declaration line consist of single integers or floating point data values.

Leading whitespace is not permitted.

Comment lines

Lines beginning with # are ignored.

Browser lines

Lines beginning with the token browser may be present. The DCC does not validate these lines against the standard WIG spec.

Other lines

Non-data, non-blank/non-whitespace lines that beginning with anything other than

browser
track
variableStep
fixedStep
#

are not valid.

Examples

Examples of TCGA submitted wig files:

Currently, GSC WIG files are used to indicate whether a given chromosomal coordinate has achieved sufficient coverage in both the tumor and normal sample from the case such that observed variants can be considered valid. "Sufficient coverage" is determined by a consensus of the GSCs; it represents approximately 30X coverage in both tumor and normal. A '1' in a WIG file at a site indicates sufficient coverage; '0' or null (coordinate not present in file) indicates currently insufficient coverage.

  • GCCs (RNAseq): examples pending

GCC (RNAseq) WIG files will indicate typically reported fold-coverage of a given chromosomal location.