NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Brian Bialecki on De-identification

  • His team at ACR is trying to find a way to release this data publicly.
  • He would like to get patient consent to share some identifying data.
  • They'd like to see the real data to assess both the real world risk of doing nothing and the real world risk of various mitigation approaches. 

Other Discussion

  • Skull stripping is not a replacement for for de-facing.
  • The value of SynthStrip is not in how well the model performs, but rather how the model is created.
  • It's difficult to find things that work across different data sets and modalities. This is something we desperately need.
  • SynthStrip does not work on slices, it is fully 3D.
  • Access model, registered or restricted, for data with a data use agreement. This is what everyone is converging on.
  • Record and track who got the data. But some repositories have no such tracking.

Next Meeting

  • Skipping July
  • August 9, 2022 at 1 p.m. EST

August 9, 2022 Meeting

Link in New Window
linkTextWebEx recording of the 08/09/2022 meeting
hrefhttps://cbiit.webex.com/cbiit/ldr.php?RCID=2318a677560a423446955128af17413c

Agenda

Applicability of SDC tools to medical image metadata – ARX

Discussion

  • Statistical re-identification of radiology and pathology data. Data includes metadata and spreadsheets accompanying the images.
  • Review of past discussion:
    • Statistical disclosure control
    • Statistical approaches are mentioned in HIPAA. HIPAA has a privacy rule that is an alternative to the Safe Harbor mechanism
    • Presentation by Dr. El Amam.
    • Estimating re-identification risk and attempting to reduce this.
    • Asked Dr. El Amam which tools exist to help to do this. He said ARX and STC Micro.
  • Today David demonstrated the ARX Anonymization Tool, which is a java-based package that runs on any platform. See this tool's YouTube channel at https://www.youtube.com/channel/UCcGAF5nQ_O6ResEF-ivsbVQ/videos
    • David demonstrated how to use this tool, based on his limited understanding of it. 
    • Imported a spreadsheet of CPTAC proteomic metadata. This dataset has around 65000 records, so it's large enough to use for this demonstration. This data is already in IDC.
    • This dataset has both actual- and quasi-identifiers.
    • He selected quasi-identifiers.
    • You have to set the sensitivity. You can set whether the data is a quasi-identifier. David set Gender and Age as quasi-identifying.
    • Prosecutor, journalist, and marketer attacker model are shown. Risk shown of how successful they will be. The risk is low for the selected dataset.
    • A prosecutor risk occurs if the adversary can know that the target is in the data set.
    • Distribution of risks in a histogram–prosecutor re-id risk on X axis, records affected on Y axis.
    • ARX is an ptimization tool that, through numerical methods, attempts to optimize changes to the data to reduce the risk and at the same time preserve the utility. It can use more than one privacy model.
    • David tested two models: a K anonymity model with a K of 2 and the average re-identification risk, which is a more complicated model.
    • David provided a generalization pattern by which the tool can aggregate the data as it chooses.
    • He demonstrated creating a hierarchy.
    • After anonymization, there are no more uniques and the quality has not been reduced.
    • The tool makes the data a bit fuzzier without getting rid of data.
    • You can see which rows have been anonymized using the Analyze Utility. Ages are binned or data is omitted according to the model's constraints.
    • Demonstration of adding in race and age. The more quasi-identifying you consider together, the more uniques you get.
    • This tool also has an API.
    • A small data set increases re-identification risk since there are more uniques.
  • Adam Taylor will explore this tool for HTAN.
  • David is working on drafting the report and hopes to have it done by the next meeting. He will include a section on statistical disclosure control and microdata.

September 13, 2022 Meeting

Link in New Window
linkTextWebEx recording of the 09/13/2022 meeting
hrefhttps://cbiit.webex.com/cbiit/ldr.php?RCID=a018919856c2c8ab1d104c9edfd71219

Agenda

  • Review draft of MIDI Task Group report that David Clunie emailed on 9/7/22.

Discussion

  • David requests that if anything is missing from/contradictory/contraversial (without sufficient justification) in his draft recommendations document to please tell him. This is meant to be a consensus.
  • The team should share with David which sections they would like to focus on if they do not intend to review the whole document.
  • The whole team should review the document within a few weeks. David will consolidate comments that the team provides in Word Track Changes.
  • David Gutman said that longer is better as far as the document because that means more cases are being considered.
  • When the document is mature, Keyvan will share it with the Steering Committee.
  • Everyone should start making a list of people they want to share the document with so that they can review it before it is broadly released.
  • Suggestions to share the final document (perhaps a short section with link to full document) with academic journals including Journal of Digital Imaging, SPIE, Journal of Cancer Imaging.
  • Absence of figures was intentional.
  • A summarizing figure or table would be useful to add. Maybe team members have their own figures they could reuse to illustrate certain points.
  • DICOM - has anyone encrypted attributes and retained them? May be useful to say there are times when encryption makes sense, but not for a public data set.
  • David wrote a long section on facial recognition and asks if the team thinks there is too much emphasis.
  • His Dates and Times section is weak and needs more input from knowledgeable people.
  • Key Findings section - balance may not have been struck. Regarding statistical methods like demonstrated in last month's meeting, they are not current practice, so David did not want to recommend them as a best practice.
  • Keyvan would like to have a workshop in Spring 2023. Would like to run a challenge.
  • We can wrap up the Task Group after the publication of the report.
  • This Task Group could morph into the program group for the challenge.
  • Are there any other topics we'd like to consider? Perhaps a project offline to obtain better statistics about facial reidentification risk.
  • Synthetic PHI–jury is out.

Action

All team members should review the draft document updated as of 10/3/22 in Word, tracking their changes and suggestions using Word's Track Changes, and email the document back to David by the week of October 9.

Next Meeting

Tuesday, October 18, 1-2 p.m. EST

October 18, 2022 Meeting

Link in New Window
linkTextWebEx recording of the 10/18/2022 meeting
hrefhttps://cbiit.webex.com/cbiit/ldr.php?RCID=05f39a834db35bc2da39d0f8740f5d43

Agenda

Discussion

    • Team believes draft report is exceptionally well written.
    • Many comments have been received, including recently many from the TCIA team, which David is working on integrating.
    • David request updated ORC IDs.
    • Carolyn will add the Advisory Group names to the draft.
    • Breach mitigation was suggested as a topic for the document but could be institution-specific and so this potentially large topic will be excluded.
    • In this document, risk means probability instead of probability multiplied by severity as a hazard.
    • Difference between direct and indirect identifier. Direct is like SS number, and example of an indirect one is race. Date of birth or name are not unique and yet most people think of them as direct identifiers. David revised the document to bring clarity to this terminology. You remove direct identifiers and generalize indirect identifiers.
    • Comments on statistical disclosure control. Action after this meeting: look at the first best practice from the perspective of performing an expert statistical analysis or not to see if it would cause trouble for someone or be overstated.
    • Institution as a proxy for location. Can you recover location from the data?
    • Best practice: Do a risk analysis or don't release the data publicly.

    Summary of this meeting and next steps:

    • Made changes to two of the best practices during the meeting 
    • Hoping to get more feedback by the end of the week.
    • More RT feedback is coming.
    • By end of week, David will accept all tracked changes and start circulating to the advisory group

Action

Provide any additional feedback by the end of this week.

Next Meeting

Tuesday, November 8, 1-2 p.m. EST

November 8, 2022 Meeting

Link in New Window
linkTextWebEx recording of the 11/08/2022 meeting
hrefhttps://cbiit.webex.com/cbiit/ldr.php?RCID=3d4290c19c529f8996740bd62ad0297b

Agenda

Discussion

Action

  • External reviewers: adding Marcus Hermann and John Snow Labs.
  • A long list of external reviewers is useful because it is a long document and that increases the possibility all of it will be reviewed.
  • David will send personal invitations to each reviewer to note their specialty and how they can focus their review.
  • We will skip the December meeting and resume in the new year.
  • Next step: focus on what we're going to do in the future. Disband, reorganize based on new goals, continue?
  • David plans to write an executive summary to submit to journals.

Action

Please feel free to recommend other external reviewers.

Next Meeting

Tuesday, January 11, 2023 at 1 p.m. ESTNext Meeting