NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Justin Kirby: report back on what TCIA encounters that is part of their human review processes

  • David Clunie: organize report topics in an outline

  • Judy: Write up some content (not the overview) on defacing

  • TJ and Ying: Can help with defacing

November 9, 2021 Meeting

WebEx recording of the 11/09/2021 meeting

Discussion of threat models:

  • A good example of a threat model is at https://arxiv.org/pdf/2103.08562.pdf.
  • There is a large body of literature on threat models.
  • Characterize what kind of attacks you want to consider. 
  • The ability to demonstrate that there is a flaw in the system. Reasons: Embarrassing the data custodian, demonstrating a new attack method, notoriety.
  • A key characteristic of an attack is that it is publicized.
  • It is not always prudent to assume that the attacker's resources are limited--free cloud credits. Price of being totally public.
  • We don't need to question whether there will be an attack on a public dataset. There will be.
  • Risk of reidentification based on the data.
  • What are the risk thresholds? Need a reasonable standard. The HIPAA privacy rule says "no reasonable basis to reidentify the individual." Are we looking at maximum or average risk? Think about it from the perspective of the easiest to identify record.
  • It's reasonable to assume a .05 probability.
  • Prosecutor risk: identify a known individual. We know the individual is in that dataset. Journalist risk: adversary does not or cannot know the target is in the dataset. 
  • Do you assume everything is publicly available? No. You need to decide what you want to protect–the patient.
  • Risk from an insider is high.
  • Question: how many people are leaving information that could be used to reidentify? Answer: Manufacturer and model–leaving it demonstrates that different equipment is used. 
  • Do we see leaving this data in as increasing our risk? No, leaving it in reduces bias. Correlation between quasi-identifiers may not be obvious. Batch effect.
  • What is the true utility of the data vs. the risk?
  • Risk models–what to include and what not to include. If we limit it to what we need right now, like scanner model and vendor, we may make the data less useful for research later.
  • What is the statistical use of the data you've retained?
  • We're not normalizing our processes of removing and curating data. Would normalization mitigate risk?
  • Is there a reasonable expectation that no one will figure out the geographic location of a scan? The task of the attacker isn't necessarily to determine the geographic location but reidentification to the individual level.

Action for all:

  • Think about what we should say about this topic in the report.