←back to thread

PostgreSQL Anonymizer

(postgresql-anonymizer.readthedocs.io)
243 points chynkm | 2 comments | | HN request time: 0s | source
Show context
gkbrk ◴[] No.42736249[source]
Clickhouse has something similar called clickhouse-obfuscator [1]. It even works offline with data dumps so you can quickly prepare and send somewhat realistic example data to others.

According to its --help output, it is designed to retain the following properties of data:

- cardinalities of values (number of distinct values) for every column and for every tuple of columns;

- conditional cardinalities: number of distinct values of one column under condition on value of another column;

- probability distributions of absolute value of integers; sign of signed integers; exponent and sign for floats;

- probability distributions of length of strings;

- probability of zero values of numbers; empty strings and arrays, NULLs;

- data compression ratio when compressed with LZ77 and entropy family of codecs;

- continuity (magnitude of difference) of time values across table; continuity of floating point values.

- date component of DateTime values;

- UTF-8 validity of string values;

- string values continue to look somewhat natural

[1]: https://clickhouse.com/docs/en/operations/utilities/clickhou...

replies(2): >>42737004 #>>42740531 #
1. bux93 ◴[] No.42737004[source]
The Dutch national office of statistics has tools intended to de-identify 'microdata' such that k-anonimity[1] is achieved called mu-argus[2] and tau-argus.

[1] A release of data is said to have the k-anonymity property if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appear in the release. https://en.wikipedia.org/wiki/K-anonymity [2] https://research.cbs.nl/casc/mu.htm

replies(1): >>42740946 #
2. aeontech ◴[] No.42740946[source]
This is really cool, and deserves a submission of its own, I'd say!