NIH | National Cancer Institute | NCI Wiki  

Date

Attendees



Goals

  • Review, discuss and prioritize ICDC study submission proposals

Outstanding Action items

Discussion items

ItemWhoNotes
Candidate Publications/Data for ICDC
  • Selected list of papers relevant to ICDC's mission from Erika Berger & Amy LeBlanc
New Data Submission Kuffel, Gina (NIH/NCI) [C]
  • 15 TB Pancreatic Cancer 
  • Still needs further discussion since this is a re-analysis of publicly available data
  • Breed prediction validation pipeline in GitHub can be found here.
Study Updates
Feature Updates
  • Successful production release for ICDC version 3.2.0 on Sept. 16th
  • Official release notes are available here
Material Transfer Agreement (MTA)Kuffel, Gina (NIH/NCI) [C]
  • Intended to serve as a mechanism for reimbursement for the cost of preparing and submitting data
RNA-Seq broad normalization
  • Depends on purpose, expression vs. fusion, etc.

Minutes (Not Verbatim)

GT- From a high-level, my recommendation would be study 1 & 7 from the candidate publication list. Other studies have limited data available.

WH- Thinking back to a year or more ago, there was a prior and similar effort. We did some reaching out to authors for submitting data, wondering where the documentation is for that effort? David Adams from Welcome and other examples.

TP- Prior to me stepping in.

WK- Regarding first paper, is this comparative genomics paper, they have human methylation data. Should we represent that data? 

WH- These are data that broaden the type of data in ICDC, but wondering if the focus should be more on omics type data.

GT- So the Osteo paper would be the right type of study to bring in then?

WH- So much change in ICDC so we need more continuity before proceeding.

WK- Slight amendment, may not be redundant, ICDC is in a much different place, we could circle back with authors that were previously contacted.

WK- Harmonization pipeline is valuable, we should send acknowledgement to authors who derived the data before proceeding. We should only import the harmonized data, don't need to bring in the original data.

GT- The harmonized data is good to have, maybe the original BAM files would be good to consume as well for reproducibility purposes.

RV- Slightly different perspective, if we look at Pan-Cancer studies in the human space, they provide aggregate data in portals and use cBioPortal and others to distribute other files. Opposed to have Bam files in multiple places. 

GT- You could pull the raw data from NCBI, need ability to perform queries across multiple datasets. If data is stored outside of ICDC will we still be able to run queries effectively. I'm interested in cohort building tooling and queries.

WK- The important part would be the cohort-building query. 

WK- With the careful approach of not bringing in the original data, we should just make sure to reach out to original authors.

WH- Agree that there is great value in being able to point to existing datasets, but let's not add noise to the system. Shaying's pipeline has not necessarily been validated by the community it was developed by a single group.

GT- It would be great to launch a harmonization pipeline that could run in an automated fashion without having to put forth much effort. 



Action items