Skip to content
Open
Changes from 8 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
f0d81eb
[HWORKS-2802] Document partitioned_by parameter on feature group crea…
jimdowling May 21, 2026
e3d5db3
Merge remote-tracking branch 'upstream/main' into HWORKS-2802
jimdowling May 30, 2026
6b0c363
[HWORKS-2802] Update partitioned_by docs for the real-column design
jimdowling May 31, 2026
523c327
Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…
jimdowling Jun 5, 2026
a899acc
Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…
jimdowling Jun 10, 2026
0049437
[HWORKS-2802] Drop key-generator detail from the Hudi partitioned_by …
jimdowling Jun 10, 2026
c28568c
Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…
jimdowling Jun 10, 2026
f1376e2
Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…
jimdowling Jun 10, 2026
1dec9e0
[HWORKS-2802] Expand partitioned_by feature group docs
jimdowling Jun 11, 2026
fcaf241
Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…
jimdowling Jun 12, 2026
49db202
[HWORKS-2807] Document partitioned_by support on Iceberg
jimdowling Jun 12, 2026
c2e8830
[HWORKS-2807] Document partitioned_by on Hudi + stream limitation
jimdowling Jun 13, 2026
c75b357
[HWORKS-2802] docs: hour-grain timestamp rule and partitioned_by in f…
jimdowling Jun 15, 2026
6977a16
Merge branch 'main' of github.com:logicalclocks/logicalclocks.github.…
jimdowling Jun 15, 2026
a977d11
[HWORKS-2802] docs: note the Table DDL card in the feature group UI
jimdowling Jun 15, 2026
207cf77
[HWORKS-2802] Correct feature-view serving behavior for partitioned_b…
jimdowling Jun 18, 2026
12e9c73
[HWORKS-2802] Clarify clickstream example is a non-stream feature group
jimdowling Jun 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions docs/user_guides/fs/feature_group/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,59 @@ MaxDirectoryItemsExceededException - The directory item limit is exceeded: limit

By using partitioning the system will write the feature data in different subdirectories, thus allowing you to write 10240 files per partition.

##### Time-grain partitioning with `partitioned_by` (Delta only)
Comment thread
jimdowling marked this conversation as resolved.
Outdated

When the partition columns are derived from the feature group's `event_time`, hand the backend the desired time grains with `partitioned_by=[...]` and the Python client derives the partition columns for you.
Comment thread
jimdowling marked this conversation as resolved.
Outdated
Pass one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`.

```python
fg = fs.get_or_create_feature_group(
name="transactions",
version=1,
primary_key=["tx_id"],
event_time="tx_ts",
partitioned_by=["year", "month", "day"],
time_travel_format="DELTA",
)
fg.insert(df) # df does not need year/month/day — the client derives them
Comment thread
jimdowling marked this conversation as resolved.
Outdated
```

The example above is equivalent to manually decomposing `tx_ts` into three columns and passing `partition_key=["year", "month", "day"]`.
The grain columns are ordinary materialized partition columns: the client computes them from `event_time` on each write and the backend registers them as partition columns through the normal table-creation path.
The source dataframe does not need to carry them.
Comment thread
jimdowling marked this conversation as resolved.
Outdated

`partitioned_by` and `partition_key` are mutually exclusive.
`partitioned_by` requires `event_time` to be set.

###### Partition pruning

The grain columns are real partition columns, so a filter on a grain column (for example `year == 2026`) prunes partitions natively.
A filter on an `event_time` range is rewritten into equivalent grain-column predicates by the query layer, so it prunes too on hierarchical specs:

Comment thread
jimdowling marked this conversation as resolved.
Outdated
| `partitioned_by` | Prunes on `event_time` range? | Prunes on `year` / `month` / `day` filter? |
| --- | --- | --- |
| `["year"]` | ✅ | ✅ |
| `["year", "month"]` | ✅ | ✅ |
| `["year", "month", "day"]` | ✅ | ✅ |
| `["year", "month", "day", "hour"]` | ✅ | ✅ |
| `["month"]` (no year) | ⚠️ no — month alone is ambiguous across years | ✅ filter on month works |
| `["year", "week"]` | ⚠️ year only — week isn't directly derivable from a date range | ✅ both columns prune |
| `["day"]` (no year/month) | ⚠️ no — day-of-month is ambiguous | ✅ filter on day works |

Prefer hierarchical specs (`["year"]`, `["year", "month"]`, `["year", "month", "day"]`) — they line up with the typical batch-pipeline access pattern and prune naturally.

###### Online feature store

Online-enabled feature groups do not yet support `partitioned_by`.
The online ingestion path does not exclude the offline-only grain columns from the Kafka/Avro schema, nor materialize them for the online write, so the backend rejects `partitioned_by` together with `online_enabled=true` until that work lands (tracked under a separate follow-up ticket).
Keep the feature group offline-only to use `partitioned_by`.
Comment thread
jimdowling marked this conversation as resolved.

###### Hudi

`partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation.
Hudi materializes the grain columns server-side in the streaming materialization job, and that work is tracked under a separate follow-up ticket.
Until that lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes.
Comment thread
jimdowling marked this conversation as resolved.
Outdated

##### Table format

When you create a feature group, you can specify the table format you want to use to store the data in your feature group by setting the `time_travel_format` parameter.
Expand Down
Loading