Page History

...

Questions for group were how to pick a threat model, which identifiers to be concerned about, and how to establish a risk threshold for public data release.
Apply stratification principles to structured data. If you have unstructured data, structure it first.
Identity disclosure, which is just one type of disclosure but the type most applicable to re-id, is when a person's identity is assigned to a record.
Trying to measure the risk of verification for a dataset
Quasi-identifiers are those known by an attacker.
Delete or encrypt/hash direct identifiers first. What we end up after that is synonymous data.
For the purposes of re-id risk, we only care about quasi-identifiers.
A meaningful re-id teaches you something new about the person.
Attack in two directions - population to sample, sample to population
Risk is measured by the group size (of 1 = unique)
Assign a risk value to each record in the dataset.
To reduce the risk, you can generalize the records and reduce the match rate.
You can suppress records, remove records, and add noise to reduce the risk of re-id as well.
generalize - group size gets bigger - risk reduces - maximum (k-anonymity)(public), average (non-public), unicity (proportion of records that are unique in the population)
You don't want to measure the risk in the data set but measure the risk in the population. The data set is just a sample from the population.
The group size in the population is the number that's important, but you have to estimate it, since you don't usually have a population registry.
Once you can estimate the risk properly, you can manage risk in a less conservative way that is still defensive.
There's no such thing as a probability of zero.
For releasing public data, a threshold in popular use today is .09. This will give you higher data quality. For particularly sensitive data sets, you would use the more strict threshold of .05.
risk denominator is not group size in sample but in population
risk threshold in identifiability spectrum
privacy-utility tradeofftrade-off
data transformations - generalization, suppression, addition of noise, microaggregation
for non-public data, can add controls (privacy, security, contractual)
motivated intruder attack

...

Content

Space Tools

Versions Compared

Old Version 5

New Version 6

Key