![]() |
Page History
...
- Questions for group were how to pick a threat model, which identifiers to be concerned about, and how to establish a risk threshold for public data release.
- Apply stratification principles to structured data. If you have unstructured data, structure it first.
- Identity disclosure, which is just one type of disclosure but the type most applicable to re-id, is when a person's identity is assigned to a record.
- Trying to measure the risk of verification for a dataset
- Quasi-identifiers are those known by an attacker.Â
- Delete or encrypt/hash direct identifiers first. What we end up after that is synonymous data.
- For the purposes of re-id risk, we only care about quasi-identifiers.
- A meaningful re-id teaches you something new about the person.
- Attack attack in two directions - population to sample, sample to population
- risk measure Risk is measured by the group size (of 1 = unique)
- To reduce the risk, you can generalize the records and reduce the match rate.
- You can suppress records, remove records, and add noise to reduce the risk of re-id as well.
- generalize- group size gets bigger - risk reduces - maximum (k-anonymity)(public), average (non-public), unicity
- risk denominator is not group size in sample but in population
- risk threshold in identifiability spectrum
- privacy-utility tradeoff
- data transformations - generalization, suppression, addition of noise, microaggregation
- for non-public data, can add controls (privacy, security, contractual)
- motivated intruder attack
...