Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions build-with-pinot/indexing/dictionary-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ In Pinot, dictionaries serve as both an index and actual encoding. Consequently,
| ------------------------------------------- | ------------------------- | ------------------------------------------------------------------- |
| [forward](forward-index.md) | | Implementation depends on whether the dictionary is enabled or not. |
| [range](range-index.md) | | Implementation depends on whether the dictionary is enabled or not. |
| [inverted](inverted-index.md) | | Requires the dictionary index to be enabled. |
| [inverted](inverted-index.md) | | Uses dictionary IDs. Pinot can materialize a standalone dictionary for RAW columns when the index is enabled. |
| [json](json-index.md) | when `optimizeDictionary` | Disables dictionary. |
| [text](text-search-support.md) | when `optimizeDictionary` | Disables dictionary. |
| FST | | Requires dictionary. |
| FST | | Uses dictionary values. Pinot can materialize a standalone dictionary for RAW STRING columns when FST is enabled. |
| [H3 (or geospatial)](geospatial-support.md) | | Incompatible with dictionary. |

## Configuration
Expand Down Expand Up @@ -70,6 +70,10 @@ Alternatively, the `encodingType` property can be changed. For example:

You may choose the option you prefer, but it's essential to maintain consistency, as Pinot will reject table configurations where the same column and index are defined in different locations.

Even when a column keeps a RAW forward index, Pinot may still materialize a standalone dictionary when another enabled
index needs dictionary IDs or dictionary values. This lets a RAW column back features such as bitmap inverted indexes
or FST/IFST without changing the forward-index encoding.

### Heuristically enable dictionaries

Most of the time the domain expert that creates the table knows whether a dictionary will be useful or not. For example, a column with random values or public IPs will probably have a large cardinality, so they can be immediately be targeted as raw encoded while columns like employee ids will have a small cardinality and therefore can be easily be recognized as good dictionary candidates. But sometimes the decision may not be clear. To help in these situations, Pinot can be configured to heuristically create the dictionary depending on the actual values and a relation factor.
Expand Down
5 changes: 4 additions & 1 deletion build-with-pinot/indexing/forward-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,10 @@ The raw value forward index stores actual values instead of IDs. This means that

As shown in the diagram below, dictionary encoding can lead to numerous random memory accesses for dictionary lookups. In contrast, the raw value forward index allows for sequential value scanning, which can enhance query performance when applied appropriately.

Note: Raw value forward index currently does not support inverted index (all others JSON/TEXT/Range/etc are supported). Also, since reading a value from this index requires reading the entire chunk in memory and decompressing, it is not suitable for heavy random reads. 
Note: A RAW forward index can still be paired with secondary indexes that need dictionary IDs. When you enable a
dictionary-backed index such as bitmap inverted index or FST/IFST on a RAW column, Pinot keeps the forward index RAW
and materializes a standalone dictionary for the secondary index. Since reading a value from this index requires
reading the entire chunk in memory and decompressing, it is not suitable for heavy random reads. 

**Sorted raw columns:** As of Pinot 1.3.0, raw columns can now be configured as sorted columns without forcing an inverted index. Previously, configuring a column as both sorted and no-dictionary would cause Pinot to force-add an inverted index, which negated the storage benefits of raw encoding. Now, you can have a time-sorted raw column (e.g., a timestamp column) without dictionary encoding or inverted index, allowing for efficient storage while maintaining sort order metadata.

Expand Down
16 changes: 10 additions & 6 deletions build-with-pinot/indexing/fst-index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# FST index

The FST (Finite State Transducer) index accelerates regex queries on dictionary-encoded STRING columns. It reduces the on-disk index size by 4-6x compared to scanning the full dictionary.
The FST (Finite State Transducer) index accelerates regex queries on STRING columns by building over dictionary
values. It reduces the on-disk index size by 4-6x compared to scanning the full dictionary.

## When to use

Expand All @@ -10,13 +11,14 @@ Use an FST index when your queries use `LIKE` or `REGEXP_LIKE` predicates on str

- STRING columns only
- Must be single-valued
- Must be dictionary-encoded
- Must have dictionary values available. This can come from a dictionary-encoded forward index, or Pinot can
materialize a standalone dictionary while keeping the forward index RAW.

## Limitations

- Only supports regex queries (`LIKE` and `REGEXP_LIKE` predicates).
- Only supported on stored or completed segments (not consuming segments in real-time tables).
- Only supported on dictionary-encoded columns.
- Only supported on columns with dictionary values available.
- Works best for prefix queries. Suffix-only or infix-only patterns may not benefit as much.

{% hint style="info" %}
Expand All @@ -27,7 +29,7 @@ For more information on FST construction, see the [Lucene FST documentation](htt

## Configuration

To enable the FST index on a dictionary-encoded column:
To enable the FST index on a column:

{% code title="Recommended: fieldConfigList" %}
```json
Expand All @@ -43,7 +45,9 @@ To enable the FST index on a dictionary-encoded column:
```
{% endcode %}

The FST index generates one index file (`.lucene.fst`). If an inverted index is also enabled on the column, FST can take advantage of it for faster lookups.
The FST index generates one index file (`.lucene.fst`). If you keep the forward index RAW, Pinot materializes a
standalone dictionary for the FST automatically. If an inverted index is also enabled on the column, FST can take
advantage of it for faster lookups.

## Query examples

Expand Down Expand Up @@ -77,7 +81,7 @@ The case-insensitive FST index (IFST) provides the same functionality as the sta

- Supports case-insensitive regex queries.
- Only supported on stored or completed segments (not consuming segments).
- Only supported on dictionary-encoded STRING columns.
- Only supported on STRING columns with dictionary values available.
- Works best for prefix queries with case-insensitive matching.

### Configuration
Expand Down
7 changes: 6 additions & 1 deletion build-with-pinot/indexing/inverted-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ The recommended way to enable a bitmap inverted index:
```
{% endcode %}

If the column uses a RAW forward index, you do not need to add a separate dictionary configuration just to make the
bitmap inverted index work. Pinot keeps the forward index RAW and materializes a standalone dictionary for the
inverted index automatically.

<details>

<summary>Older configuration</summary>
Expand Down Expand Up @@ -112,7 +116,8 @@ LIMIT 10

## Limitations

- Bitmap inverted indexes require [dictionary encoding](dictionary-index.md) to be enabled on the column.
- Bitmap inverted indexes require dictionary IDs, but Pinot can satisfy that either with a dictionary-encoded forward
index or with a standalone dictionary materialized for a RAW forward index.
- Sorted inverted indexes (on dictionary-encoded columns) only work on columns whose data is physically sorted within each segment.
- Sorted raw columns (no-dictionary) also support sort metadata without requiring an inverted index.
- MAP columns are not supported.
Loading