NIH | National Cancer Institute | NCI Wiki  

Date

Attendees



Goals

  • Get demonstration of existing PDC website.

Discussion items

TimeItemWhoNotes

DemoRajesh Thangudu
  • Home page: all numbers are cases.
    • Links to Browse, Analysis, Submit Data, About, Help
  • Browse page:
    • Chart gives overview and can be collapsed
    • Study-centric portal
    • Project/programs are de-emphasized and study/cases is emphasized
    • Filters are grouped: General, biospecimen, Clinical, Files, Genes
      • General: Primary site, Program, Disease Type, Analytical Fraction, Experiment Type, Acquisition Type
      • Biospecimen: Study, Sample Type, Biospecimen Status
      • Clinical: Ethnicity, Race, Gender, Tumor Grade, Case Status
      • Files: Study, Data Category, File Type, Access (open/closed), Downloadable
      • Genes: enter gene list to filter to specific genes, or can use prebuilt list.
    • Biospecimen status allows filtering for those studies that are not qualified (for whatever reason) - Question: of what use are unqualified samples to a researcher?  Qualified and Disqualified status are confusing (John).  Individual disqualified biospecimens are highlighted.
    • Files: raw data and processed data are all available to the user
    • Filter choice results are grouped of studies, biospecimens, clinical, files and genes (similar to how ICDC use stats bar at the top).
    • Show Query shows the REST API endpoint for the results shown.
    • Study summary page shown if a particular study is selected.  Shows data use agreement.  PDC will use CCBY license. Won't have program specific data agreements in future.  Description, Protocol, Experimental Design, Clinical, Biospecimens, Workflow and Data Use Agreement (DUA).  Heat maps for a particular study are available if submitted - user can cluster and filter the heatmap.  Broad institute Morpheus opensource heatmap software.
    • Selecting a particular study requires using the study filter.
    • Data files can be directly downloaded using signed urls.  Can be generated through API or UI.
    • pdc.cancer.gov/graphiql for queries
    • all the metadata is open - nothing controlled access.  File data also open access.
    • login exists, but not implemented as required since there is no controlled access at this time.  No user level activity tracked - all activity is anonymous.
    • Links to GDC data where relevant.  Need to know where GDC data is, cannot pull linked data directly through filters.  Will have links to other resources in future.  Links are at sample level.  No direct mapping between PDC and GDC studies.  If GDC develops study page, PDC could link there.  CDA may help us see these connections in the future.  Chris said that NCI is very interested in enhancing connections. Filter suggestion: CRDC interoperability (like general, biospecimen, clinical, files, genes on left side).
    • Filters on left say "General", grouped columns in results say "Study".  This differs from rest where filter says "Biospecimen" and grouped columns say "Biospecimen", etc.
  • Workspace:
    • User accounts needed.
    • User can link instruments directly into workspace (can create own instruments).
    • Can load files from instrument directly or through S3 buckets (doesn't require a lot of metadata).  Can keep files here until ready to create a study.
    • Study requires additional metadata (user entered).  Makes a group of files analyzable by a bioinformatician.  UI driven metadata entry.  Can link as many or as few files from the instrument as desired.  Metadata for each file must be entered.  
    • Protocol created separately and linked to study.
    • Clinical and biospeciment data submitted via tsv format and loaded to workspace and then linked to study.  Data is validated against experimental design and then loaded and then released to the data portal.
    • Validation: Errors are thrown in the UI.  Not sure if errors are shown compiled (all at once), or stop at the first error.  Data that is not validated - thrown away or kept? (Rajesh will check).
  • No analysis yet about how system is being used.
  • Search bar is more extensive than indicated (Gene/CaseID).  These were anticipated as most common search filters and so they were highlighted, but it does more than just that.
  • Documentation/User Guides:  FAQs, help section has video tutorials and descriptive page with screenshots.  API documentation is extensive.




Recording:

Matthew Beyers's Personal Room-20200506 1502-1.mp4

Action items

  • setup call for Amit to ask system architecture questions.
  •  Rajesh will check on validation: errors all at once or stop at first?  Keep unvalidated data or discard and try again?
  •