Anonymization

>> page in progress <<

Definition
Anonyimization consists of techniques for data processing and procedures for handling the data, algorythms, keys, and lifecycle of the data. For privacy reasons, personal identifiable information (PII) often needs to be anonymized for testing and analysis. There are several techniques to do this :

Replacement - substitute identifying numbers Suppression - omit from the released data, partially or fully Generalization - for example, replace birth date with something less specific, like year of birth Perturbation - make random changes to the data

Requirements
- masked data should be irreversible - (schema) type compliant - preservation of semantics - references in data should be kept intact - masking should be repeatable - non-sensitive data should be also anonymized if it could lead to identification - distribution preservation

Also : - Reusable for other data sets - Transparent, so that auditors can verify that indeed the data is masked appropriately

Referential Integrity - Semantical Integrity --- - Separation of Duties -

Masking Methods
- Form Preserving Encryption (FPE) - shuffling - encryption - substitution : replacing values by values from another source

Non-deterministic randomization: Blurring: Redaction: Shuffling: Averaging: Repeatable masking: Substitution: Specialized rules: These rules are for particular fields such as Social Security/tax id numbers, credit card numbers, street addresses and telephone numbers that are structurally correct and used for workflow and checksum validation. As an example, substituting 100 Wall St., New York, N.Y. for 50 Maple Lane, Newark, N.J. where each random value -- house number, street, city and state -- make up a valid address and can be found using applications like Google maps or MapQuest. Tokenization:

These techniques can be applied to : - data at rest - visible data (logs, data exports, web pages)

Reasons for anonymization
-

Open Source Tools
Arx Jailer Metadata Anonymization Toolkit Talend Data Studio

Related terms
Pseudonymization De-Identification Scrubbing Data Sanitization Data scrambling Data masking Form preserving encryption (FPE) De-anonymization

Data Masking: What you need to know k-anonymity k-anonymity SDC l-diversity

The Complete Book of Data Anonymization The Complete Book of Data Anonymization (pdf download per chapter