Older archives submitted to the DCC are periodically transferred to backup tapes and removed from the file system. Data freezes incorporate specific revisions of archives that may not always remain as the latest archive versions and could be rendered inaccessible for download by being removed from the file system. It is imperative to retain these archives on the file system so that publication analyses can be reproduced to match the outcome reported at the time it was published.
Capturing a data freeze involves marking which archives need to be frozen under a data freeze set and also linking the freeze set to a publication.
The relationships between publication, data freeze set and archive are described as follows:
- A publication can have one or more data freeze sets (for example, an analysis group uses sample subsets that reference different archives).
- A data freeze set can belong to one or more publications (for example, one data freeze set is used to write two papers).
- A data freeze set can span one or more disease studies (for example, lung study = LUAD + LUSC).
- A data freeze set can contain one or more archives.
- An archive can belong to one or more data freeze sets (for example, the same COAD archive is frozen for a colorectal and cross-cancer comparison data freeze).