Summary of MIDI Task Group Meetings To Date

Meeting date: July 20, 2021

WebEx recording of 7/20/2021 meeting

Introduction: Medical Image De-Identification Initiative (MIDI)
Task Group goals
Steering Committee
Timeline
Discussion

Meeting date: August 10, 2021

WebEx recording of 8/10/2021 meeting

Instructions to access the MIDI Task Group wiki page
Accept Mendeley invitation to access private group for literature review/annotated bibliography
Outline of approach
- metadata vs. pixel data
- metadata
- structured (strongly typed) vs. text
- pixel data
- burned-in text ("printed" and hand-written)
- identifiable features (e.g., faces, iris, retina)
- with or without "public" data to compare with
Challenging topics
- evaluation of success of de-identification
- quantitative comparison of performance
- quantifying re-identification risk
- creating test data sets
- faces (etc.) reconstructed from cross-sections
- burned-in text - detection, removal, cleaning
- cleaning text descriptors (metadata or burned in)
- buried metadata (e.g., EXIF, geotags in JPEG inside DICOM)
- dates (incl. preserving temporal relationships)
- pseudonym consistency across separate submissions
- risks of hashing to create pseudonymous identifiers
- uniqueness of images limits statistical approaches
- loss allowable during de-identification (e.g., age fuzzing, pixels)
- private data element preservation to retain utility
- ultrasound - still frames and cine loops, lossy compressed
- photographs
- video
- gross pathology and whole slide images (incl. labels)
- IRB/ethics committee messaging wrt. de-identification decisions
- IT security approval/audits of de-identification
- regulatory requirements: HIPAA Privacy Rule, GDPR, CCPA, others?
- sufficiency of standards, e.g., DICOM PS3.15 Annex E
- risk of not following a standard (home-grown decisions)
- threat of image "signatures", private set intersection methods
- policy versus the technical details of recompression/decompression artifacts for JPEG
- data minimization
Inventory of tools
- user interface vs. scripted (bulk, service)
- configurable - user vs. installer vs. hard-coded
- platform, language
- open source, free, commercial, service
- on-site vs. outside (e.g., [IP]II needs to leave walls for AI on cloud)
Roadmap and deliverables
- interim report
  - full report
  - "primer" on medical image de-identification for newbies/execs
  - confirm what is out of scope (non-goals) - consent, data use agreements, ...
Tasking: Members will think about which task they would like to contribute to.

Meeting date: September 14, 2021

WebEx recording of 9/14/2021 meeting

Role of AI in de-identification - demand for data, opportunities, threats
- Google has a de-id tool
- Amazon Comprehension
- Identifying images at risk–which images are likely to contain burned in information than others?
- Problem with scalability in terms of building the ruleset. Better to identify selectively.
- Barcodes, pacemaker serial numbers, implanted devices
- There is the potential of identifying objects but not the raw data.
- Action: Describe the steps involved in imaging and the evolution of data in different levels of processing
  Case-based data
- Is raw data in our purview?
- Raw data is often in proprietary format and can lack a header.
- Post-processed data like 3D reconstructions
- What is the harm of reidentification? High-resolution 3D image of the face
- Penetration testers that applies to de-ID
- How to evaluate the success of de-facing?
- Newman, L. H. (2016). AI Can Recognize Your Face Even If You’re Pixelated. Wired. https://www.wired.com/2016/09/machine-learning-can-identify-pixelated-faces-researchers-show/
- When is it okay to release information that you know is identifiable? Example of boy in NYT.
- Sometimes reidentification does not provide any new data.
- What do you now know that you didn't know before?
- Expectations of doing better deidentification and the threats of better reidentification. What can we do now and what in the future with AI?
- Do you expect that one day a machine will replace your manual deidentification process? Can a robot replace human review?
- Can you accept the risk of AI/machines/code? Get to the level of risk that is tolerable.
- Main topic for the next call: the need for human QC.
- When will you stop using humans or a targeted subset?
- What would increase your comfort level to help you stop using human QC.

Content

Space Tools