NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Goals

  • CPTAC Data Portal Overview – Rajesh
  • Discuss desired functionality to migrate – All
  • Discuss engineering, UI and Financial impact to PDC – All
  • Next Steps - All

Discussion items

TimeItemWhoNotes

OverviewRajesh/Karen

There are two types of data in the CPTAC Data Portal: pre-publication(early) and published.  There were 5 dataset releases in 2019, both early and published.  Data Use Agreements exist on the system and Embargoes are used to hold data until release.  

File types:

  1. metadata - clinical protocols, mapping files for sample to labels
  2. Mass Spectrometry raw files.

PDC is a work space.  The portal just provides files.

CPTAC is sponsored by the Office of Cancer Clinical Proteomics Research (OCCPR).  Proteomics.cancer.gov/data-portal

Portal pages are hosted in Cabazon.  Data is hosted in ESAC.

Studies are a group of data collected together on a page - not necessarily a publication.

Sample type = tumor.  Numbers are cancer individuals, not tumor/normal.

They are not tracking or enforcing the data use agreements.

Genomic data goes into the GDC.  Imaging data in TCIA.

Monthly use is about 69K files, 17TB of data.

URL's in CPTAC data portal are used in publications - need to maintain these or use redirects.

Data Submission: CPTACDCC.Georgetown.edu.  Data is uploaded and some QC is done.  Data is prepared for Common Data Analysis Pipeline and they need this capability for processing.  On PDC, the user does processing, but on CPTAC, Karen's group does it.

There is no suggestion that the DCC will go away.  Maybe the DCC can submit the data to PDC data portal instead of CPTAC data portal.

No functionality appears to exist in CPTAC Data Portal that is not currently in PDC - the CPTAC data portal is only a file server.

Workflow: Data at ESAC is separate from the PDC data.  There is a significant cost to data in ESAC especially if downloads are allowed.  The data at ESAC is currently stored in Google Cloud.  There may be access to free egress from the Academic Center at Georgetown.

No publication specific pages exist in PDC - they will need to be built.

Redirects from the study pages on CPTAC Data Portal need to be built to a specific query on PDC.

There is a need to define the workflow of data from CPTAC DCC to PDC.




Next Steps

Confirm Henry agrees with the vision for migration of the data from CPTAC Data Portal to PDC.

Karen/Rajesh to provide workflow for existing/new data from CPTAC to PDC.

Action items

  •