NIH | National Cancer Institute | NCI Wiki  

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

Contents of This Page

July 20, 2021 Meeting

WebEx recording of 7/20/2021 meeting

  • Introduction: Medical Image De-Identification Initiative (MIDI)
  • Task Group goals
  • Steering Committee
  • Timeline
  • Discussion

August 10, 2021 Meeting

WebEx recording of 8/10/2021 meeting

  • Instructions to access the MIDI Task Group wiki page
  • Accept Mendeley invitation to access private group for literature review/annotated bibliography
  • Outline of approach
    • metadata vs. pixel data
    • structured (strongly typed) vs. text
    • burned-in text ("printed" and hand-written)
    • identifiable features (e.g., faces, iris, retina)
    • with or without "public" data to compare with
  • Challenging topics
    • evaluation of success of de-identification
    • quantitative comparison of performance
    • quantifying re-identification risk
    • creating test data sets
    • faces (etc.) reconstructed from cross-sections
    • burned-in text - detection, removal, cleaning
    • cleaning text descriptors (metadata or burned in)
    • buried metadata (e.g., EXIF, geotags in JPEG inside DICOM)
    • dates (incl. preserving temporal relationships)
    • pseudonym consistency across separate submissions
    • risks of hashing to create pseudonymous identifiers
    • uniqueness of images limits statistical approaches
    • loss allowable during de-identification (e.g., age fuzzing, pixels)
    • private data element preservation to retain utility
    • ultrasound - still frames and cine loops, lossy compressed
    • photographs and video
    • gross pathology and whole slide images (incl. labels)
    • IRB/ethics committee messaging wrt. de-identification decisions
    • IT security approval/audits of de-identification
    • regulatory requirements: HIPAA Privacy Rule, GDPR, CCPA, others?
    • sufficiency of standards, e.g., DICOM PS3.15 Annex E
    • risk of not following a standard (home-grown decisions)
    • threat of image "signatures", private set intersection methods
    • policy versus the technical details of recompression/decompression artifacts for JPEG
    • data minimization
  • Inventory of tools
    • user interface vs. scripted (bulk, service)
    • configurable - user vs. installer vs. hard-coded
    • platform, language
    • open source, free, commercial, service
    • on-site vs. outside (e.g., [IP]II needs to leave walls for AI on cloud)
  • Roadmap and deliverables
    • interim report
      • full report
      • "primer" on medical image de-identification for newbies/execs
      • confirm what is out of scope (non-goals) - consent, data use agreements, ...
  • Tasking: Members to think about which task they would like to contribute to.

September 14, 2021 Meeting

WebEx recording of the 9/14/2021 meeting

  • Role of AI in de-identification - demand for data, opportunities, threats
    • Google has a de-id tool
    • Amazon Comprehension
    • Identifying images at risk–which images are likely to contain burned in information than others?
    • Problem with scalability in terms of building the ruleset. Better to identify selectively.
    • Barcodes, pacemaker serial numbers, implanted devices
    • There is the potential of identifying objects but not the raw data.
    • Action: Describe the steps involved in imaging and the evolution of data in different levels of processing
      Case-based data
    • Is raw data in our purview?
    • Raw data is often in proprietary format and can lack a header.
    • Post-processed data like 3D reconstructions
    • What is the harm of reidentification? High-resolution 3D image of the face
    • Penetration testers that applies to de-ID
    • How to evaluate the success of de-facing?
    • Newman, L. H. (2016). AI Can Recognize Your Face Even If You’re Pixelated. Wired. https://www.wired.com/2016/09/machine-learning-can-identify-pixelated-faces-researchers-show/
    • When is it okay to release information that you know is identifiable? Example of boy in NYT.
    • Sometimes reidentification does not provide any new data.
    • What do you now know that you didn't know before?
    • Expectations of doing better deidentification and the threats of better reidentification. What can we do now and what in the future with AI?
    • Do you expect that one day a machine will replace your manual deidentification process? Can a robot replace human review?
    • Can you accept the risk of AI/machines/code? Get to the level of risk that is tolerable.
    • Main topic for the next call: the need for human QC.
    • When will you stop using humans or a targeted subset?
    • What would increase your comfort level to help you stop using human QC.

October 12, 2021 Meeting

WebEx recording of the 10/12/2021 meeting

Discussion of this document:

  • Not practical for a human to review all of the images.
  • TCIA built a tool called Kaleidoscope that flattens images and saves time.
  • Radiology techs can also do this work, but sensitivity goes down as you view more images.
  • What is the cost of a data breach in terms of manpower? 
  • As screening goes up, breaches go up.

Discussion of the de-identification process:

  • Did you have a formal QC process that involved you verifying the quality of the de-identification process after it was done?
  • John Perry: developed a process and a test to make sure it worked, but didn't look at all of the images to confirm it was done without breaches.
  • Monitor logs to make sure nothing slips through without automation applied to it. Grab a random 1% and look through the headers.
  • Need a more medical model that understands the variability in what we're trying to do
  • Partial vs. complete success-field or header
  • Catch-22 that you can't crowd-source because there could be PHI
  • Build synthetic datasets that have real street addresses in real places that don't match the actual data
  • Train a model and release that but not the dataset
  • Would need a statistician
  • Judy: We are encountering issues that the black box models do not understand. Running experiments on adversarial networks. Surprising findings.
  • Amalgamate clinical and imaging data. 
  • Models have already learned sufficient information to learn age, sex, and race. We don't understand how this happens and maybe they could pick up other identification data.
  • We are not trying to hide age, sex, and race. We're trying to prevent the re-identification of a person.
  • Increasing the uniqueness of the image data is a threat for re-identification. But if you don't have a database of everyone's fingerprints, for example, it's useless.
  • At some point we have to be clear of what we are trying to reidentify and what the practical limits are.
  • Clearview.ai

Tasking

  • Justin Kirby: report back on what TCIA encounters that is part of their human review processes

  • David Clunie: organize report topics in an outline

  • Judy: Write up some content (not the overview) on defacing

  • TJ and Ying: Can help with defacing