Contents of This Page

July 20, 2021 Meeting

WebEx recording of 7/20/2021 meeting

Introduction: Medical Image De-Identification Initiative (MIDI)
Task Group goals
Steering Committee
Timeline
Discussion

August 10, 2021 Meeting

WebEx recording of 8/10/2021 meeting

Instructions to access the MIDI Task Group wiki page
Accept Mendeley invitation to access private group for literature review/annotated bibliography
Outline of approach
- metadata vs. pixel data
- structured (strongly typed) vs. text
- burned-in text ("printed" and hand-written)
- identifiable features (e.g., faces, iris, retina)
- with or without "public" data to compare with
Challenging topics
- evaluation of success of de-identification
- quantitative comparison of performance
- quantifying re-identification risk
- creating test data sets
- faces (etc.) reconstructed from cross-sections
- burned-in text - detection, removal, cleaning
- cleaning text descriptors (metadata or burned in)
- buried metadata (e.g., EXIF, geotags in JPEG inside DICOM)
- dates (incl. preserving temporal relationships)
- pseudonym consistency across separate submissions
- risks of hashing to create pseudonymous identifiers
- uniqueness of images limits statistical approaches
- loss allowable during de-identification (e.g., age fuzzing, pixels)
- private data element preservation to retain utility
- ultrasound - still frames and cine loops, lossy compressed
- photographs and video
- gross pathology and whole slide images (incl. labels)
- IRB/ethics committee messaging wrt. de-identification decisions
- IT security approval/audits of de-identification
- regulatory requirements: HIPAA Privacy Rule, GDPR, CCPA, others?
- sufficiency of standards, e.g., DICOM PS3.15 Annex E
- risk of not following a standard (home-grown decisions)
- threat of image "signatures", private set intersection methods
- policy versus the technical details of recompression/decompression artifacts for JPEG
- data minimization
Inventory of tools
- user interface vs. scripted (bulk, service)
- configurable - user vs. installer vs. hard-coded
- platform, language
- open source, free, commercial, service
- on-site vs. outside (e.g., [IP]II needs to leave walls for AI on cloud)
Roadmap and deliverables
- interim report
  - full report
  - "primer" on medical image de-identification for newbies/execs
  - confirm what is out of scope (non-goals) - consent, data use agreements, ...
Tasking: Members to think about which task they would like to contribute to.

September 14, 2021 Meeting

WebEx recording of the 9/14/2021 meeting

Role of AI in de-identification - demand for data, opportunities, threats
- Google has a de-id tool
- Amazon Comprehension
- Identifying images at risk–which images are likely to contain burned in information than others?
- Problem with scalability in terms of building the ruleset. Better to identify selectively.
- Barcodes, pacemaker serial numbers, implanted devices
- There is the potential of identifying objects but not the raw data.
- Action: Describe the steps involved in imaging and the evolution of data in different levels of processing
  Case-based data
- Is raw data in our purview?
- Raw data is often in proprietary format and can lack a header.
- Post-processed data like 3D reconstructions
- What is the harm of reidentification? High-resolution 3D image of the face
- Penetration testers that applies to de-ID
- How to evaluate the success of de-facing?
- Newman, L. H. (2016). AI Can Recognize Your Face Even If You’re Pixelated. Wired. https://www.wired.com/2016/09/machine-learning-can-identify-pixelated-faces-researchers-show/
- When is it okay to release information that you know is identifiable? Example of boy in NYT.
- Sometimes reidentification does not provide any new data.
- What do you now know that you didn't know before?
- Expectations of doing better deidentification and the threats of better reidentification. What can we do now and what in the future with AI?
- Do you expect that one day a machine will replace your manual deidentification process? Can a robot replace human review?
- Can you accept the risk of AI/machines/code? Get to the level of risk that is tolerable.
- Main topic for the next call: the need for human QC.
- When will you stop using humans or a targeted subset?
- What would increase your comfort level to help you stop using human QC.

October 12, 2021 Meeting

WebEx recording of the 10/12/2021 meeting

Discussion of this document:

Not practical for a human to review all of the images.
TCIA built a tool called Kaleidoscope that flattens images and saves time.
Radiology techs can also do this work, but sensitivity goes down as you view more images.
What is the cost of a data breach in terms of manpower?
As screening goes up, breaches go up.

Discussion of the de-identification process:

Did you have a formal QC process that involved you verifying the quality of the de-identification process after it was done?
John Perry: developed a process and a test to make sure it worked, but didn't look at all of the images to confirm it was done without breaches.
Monitor logs to make sure nothing slips through without automation applied to it. Grab a random 1% and look through the headers.
Need a more medical model that understands the variability in what we're trying to do
Partial vs. complete success-field or header
Catch-22 that you can't crowd-source because there could be PHI
Build synthetic datasets that have real street addresses in real places that don't match the actual data
Train a model and release that but not the dataset
Would need a statistician
Judy: We are encountering issues that the black box models do not understand. Running experiments on adversarial networks. Surprising findings.
Amalgamate clinical and imaging data.
Models have already learned sufficient information to learn age, sex, and race. We don't understand how this happens and maybe they could pick up other identification data.
We are not trying to hide age, sex, and race. We're trying to prevent the re-identification of a person.
Increasing the uniqueness of the image data is a threat for re-identification. But if you don't have a database of everyone's fingerprints, for example, it's useless.
At some point we have to be clear of what we are trying to reidentify and what the practical limits are.
Clearview.ai

Tasking

Justin Kirby: report back on what TCIA encounters that is part of their human review processes
David Clunie: organize report topics in an outline
Judy: Write up some content (not the overview) on defacing
TJ and Ying: Can help with defacing

Content

Space Tools

Summary of MIDI Task Group Meetings To Date

July 20, 2021 Meeting

August 10, 2021 Meeting

September 14, 2021 Meeting

October 12, 2021 Meeting