Jump to: navigation, search

>> page in progress <<


Anonyimization consists of techniques for data processing and procedures for handling the data, algorythms, keys, and lifecycle of the data. For privacy reasons, personal identifiable information (PII) often needs to be anonymized for testing and analysis. There are several techniques to do this :

   Replacement - substitute identifying numbers
   Suppression - omit from the released data, partially or fully
   Generalization - for example, replace birth date with something less specific, like year of birth
   Perturbation - make random changes to the data


- masked data should be irreversible
- (schema) type compliant
- preservation of semantics 
- references in data should be kept intact
- masking should be repeatable
- non-sensitive data should be also anonymized if it could lead to identification 
- distribution preservation

Also :

- Reusable for other data sets
- Transparent, so that auditors can verify that indeed the data is masked appropriately
---- Referential Integrity -----
---- Semantical Integrity -------
----- Separation of Duties -----

Differential Privacy

Masking Methods

- Form Preserving Encryption (FPE)
- shuffling
- encryption
- substitution : replacing values by values from another source
   Non-deterministic randomization: 
   Repeatable masking: 
   Specialized rules: These rules are for particular fields such as Social Security/tax id numbers, credit card numbers, street addresses and telephone numbers that are structurally correct and used for workflow and checksum validation. As an example, substituting 100 Wall St., New York, N.Y. for 50 Maple Lane, Newark, N.J. where each random value -- house number, street, city and state -- make up a valid address and can be found using applications like Google maps or MapQuest.

These techniques can be applied to :

- data at rest
- visible data (logs, data exports, web pages)

Reasons for anonymization


Open Source Tools

Arx Jailer Metadata Anonymization Toolkit Talend Data Studio

Related terms

Pseudonymization De-Identification Scrubbing Data Sanitization Data scrambling Data masking Form preserving encryption (FPE) De-anonymization

Data Masking: What you need to know k-anonymity k-anonymity SDC l-diversity

The Complete Book of Data Anonymization The Complete Book of Data Anonymization (pdf download per chapter