NIH | National Cancer Institute | NCI Wiki  

WIKI MAINTENANCE NOTICE

Please be advised that NCI Wiki will be will be undergoing maintenance on Monday, June 24th between 1000 ET and 1100 ET.
Wiki will remain available, but users may experience screen refreshes or HTTP 502 errors during the maintenance period. If you encounter these errors, wait 1-2 minutes, then refresh your page.

If you have any questions or concerns, please contact the CBIIT Atlassian Management Team.

Date

Attendees



Goals

  • CPTAC Data Portal Overview – Rajesh
  • Discuss desired functionality to migrate – All
  • Discuss engineering, UI and Financial impact to PDC – All
  • Next Steps - All

Discussion items

TimeItemWhoNotes

OverviewRajesh/Karen

There are two types of data in the CPTAC Data Portal: pre-publication(early) and published.  There were 5 dataset releases in 2019, both early and published.  Data Use Agreements exist on the system and Embargoes are used to hold data until release.  

File types:

  1. metadata - clinical protocols, mapping files for sample to labels
  2. Mass Spectrometry raw files.

PDC is a work space.  The portal just provides files.

CPTAC is sponsored by the Office of Cancer Clinical Proteomics Research (OCCPR).  Proteomics.cancer.gov/data-portal

Portal pages are hosted in Cabazon.  Data is hosted in ESAC.

Studies are a group of data collected together on a page - not necessarily a publication.

Sample type = tumor.  Numbers are cancer individuals, not tumor/normal.

They are not tracking or enforcing the data use agreements.

Genomic data goes into the GDC.  Imaging data in TCIA.

Monthly use is about 69K files, 17TB of data.

URL's in CPTAC data portal are used in publications - need to maintain these or use redirects.

Data Submission: CPTACDCC.Georgetown.edu.  Data is uploaded and some QC is done.  Data is prepared for Common Data Analysis Pipeline and they need this capability for processing.  On PDC, the user does processing, but on CPTAC, Karen's group does it.

There is no suggestion that the DCC will go away.  Maybe the DCC can submit the data to PDC data portal instead of CPTAC data portal.

No functionality appears to exist in CPTAC Data Portal that is not currently in PDC - the CPTAC data portal is only a file server.

Workflow: Data at ESAC is separate from the PDC data.  There is a significant cost to data in ESAC especially if downloads are allowed.  The data at ESAC is currently stored in Google Cloud.  There may be access to free egress from the Academic Center at Georgetown.

No publication specific pages exist in PDC - they will need to be built.

Redirects from the study pages on CPTAC Data Portal need to be built to a specific query on PDC.

There is a need to define the workflow of data from CPTAC DCC to PDC.




Next Steps

Confirm Henry agrees with the vision for migration of the data from CPTAC Data Portal to PDC.

Karen/Rajesh to provide workflow for existing/new data from CPTAC to PDC.

Action items

  •