Page History

...

Questions for group were how to pick a threat model, which identifiers to be concerned about, and how to establish a risk threshold for public data release.
Apply stratification principles to structured data. If you have unstructured data, structure it first.
Identity disclosure, which is just one type of disclosure but the type most applicable to re-id, is when a person's identity is assigned to a record.
Trying to measure the risk of verification for a dataset
Quasi-identifiers are those known by an attacker.
Delete or encrypt/hash direct identifiers first. What we end up after that is synonymous data.
For the purposes of re-id risk, we only care about quasi-identifiers.
A meaningful re-id teaches you something new about the person.
Attack attack in two directions - population to sample, sample to population
risk measure Risk is measured by the group size (of 1 = unique)
To reduce the risk, you can generalize the records and reduce the match rate.
You can suppress records, remove records, and add noise to reduce the risk of re-id as well.
generalize- group size gets bigger - risk reduces - maximum (k-anonymity)(public), average (non-public), unicity
risk denominator is not group size in sample but in population
risk threshold in identifiability spectrum
privacy-utility tradeoff
data transformations - generalization, suppression, addition of noise, microaggregation
for non-public data, can add controls (privacy, security, contractual)
motivated intruder attack

...

Versions Compared