You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dandanlen edited this page Jan 7, 2020
·
1 revision
Supported data types
The following data types are supported by Diffix and must be considered by the explorer:
integer | real | text | boolean | datetime | date | time
Continuous vs Categorical
Most data types can be considered either continuous or categorical based on context.
For example:
the type of a gender column might be text, but there may only be two possible values, MALE and FEMALE.
a month column may have type integer and categorical values 1-12
The distinction is not always clear: we will need to use some heuristics to determine which exploration approach to use for each column.
Numerical
Numerical columns (integer and real) can be analysed using a bucketing approach to extract histograms.
Text
For text columns, useful metrics may be word count, prefix, postfix, or substring counts, and other patterns for example email domains, postcodes, etc.