You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dandanlen edited this page Jan 7, 2020
·
6 revisions
Motivation
Anonymized data from the Diffix-protected datasets is inherently restricted. The analyst needs to be familiar with the imposed limitations, and knowledgeable of possible workarounds. The aim of this project is to build a system that automatically extracts a high-level picture of the shape of a given data set whilst intelligently navigating the restrictions imposed by Diffix.
The most fundamental limitation of a Diffix-protected database is that you can't query any data that would uniquely (or even almost-uniquely) identify a person in the database. As a result, the main way of extracting information about a given dataset is through aggregates. On their own, the aggregate functions such as min, max, count, avg... return very coarse-grained stats of limited usefulness. However, using tricks such as calculating aggregates over sub-ranges of data, we can extract enhanced statistics such as histograms to aid the analyst in their exploration of the dataset.