From f0d81ebf4b9036673cc424a0252bf652b4cb0354 Mon Sep 17 00:00:00 2001
From: Jim Dowling <jim@hopsworks.ai>
Date: Thu, 21 May 2026 14:48:19 +0200
Subject: [PATCH 01/10] [HWORKS-2802] Document partitioned_by parameter on
 feature group creation https://hopsworks.atlassian.net/browse/HWORKS-2802
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add a section to docs/user_guides/fs/feature_group/create.md
describing the storage-engine-native partitioned_by parameter for
Delta feature groups. Covers:

- Usage example with create_feature_group / get_or_create_feature_group.
- The CREATE TABLE … USING DELTA … GENERATED ALWAYS AS … contract:
  the storage layer derives the partition columns; the user's
  dataframe never carries them.
- Validation rules: mutual exclusion with partition_key, requires
  event_time.
- Partition pruning table — Delta auto-derives partition predicates
  from the GENERATED expressions for hierarchical specs (year /
  year+month / year+month+day / year+month+day+hour), so
  `fg.read(start_time=..., end_time=...)` and
  `fg.filter(fg.event_time >= ...)` prune at the partition level.
  Non-hierarchical specs (e.g. ["month"], ["year","week"]) are valid
  but skip the auto-derivation — only direct predicates on the
  grain columns prune. Recommend hierarchical specs.
- Online feature store behavior: derived columns live offline-only
  by default; online_partition_columns=true opts into online
  materialization. Until the onlinefs consumer filter ships, the
  backend rejects partitioned_by + online_enabled=true with the
  default online_partition_columns=false. Document both
  workarounds.
- Hudi: partitioned_by + HUDI is rejected at creation; Hudi support
  is tracked under a separate follow-up ticket.

Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/user_guides/fs/feature_group/create.md | 54 +++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md
index c6db36f3ef..c7c6a91d0f 100644
--- a/docs/user_guides/fs/feature_group/create.md
+++ b/docs/user_guides/fs/feature_group/create.md
@@ -102,6 +102,60 @@ MaxDirectoryItemsExceededException - The directory item limit is exceeded: limit
 
 By using partitioning the system will write the feature data in different subdirectories, thus allowing you to write 10240 files per partition.
 
+##### Time-grain partitioning with `partitioned_by` (Delta only)
+
+When the partition columns are derived from the feature group's `event_time`, the Python client can hand the backend the desired time grains and let the storage engine generate the partition columns automatically.
+Pass `partitioned_by=[...]` with one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`.
+
+```python
+fg = fs.get_or_create_feature_group(
+    name="transactions",
+    version=1,
+    primary_key=["tx_id"],
+    event_time="tx_ts",
+    partitioned_by=["year", "month", "day"],
+    time_travel_format="DELTA",
+)
+fg.insert(df)  # df does not need year/month/day — Delta derives them
+```
+
+The example above is equivalent to manually decomposing `tx_ts` into three columns and passing `partition_key=["year", "month", "day"]`.
+The backend creates the table via `CREATE TABLE … USING DELTA … GENERATED ALWAYS AS …`, so the derived columns live entirely inside the storage layer; the source dataframe never carries them.
+
+`partitioned_by` and `partition_key` are mutually exclusive.
+`partitioned_by` requires `event_time` to be set.
+
+###### Partition pruning
+
+Delta auto-derives partition predicates from the GENERATED expressions when the user filters on the source column.
+Filtering on `event_time` ranges therefore prunes partitions for free on hierarchical specs:
+
+| `partitioned_by` | Prunes on `event_time` range? | Prunes on `year` / `month` / `day` filter? |
+| --- | --- | --- |
+| `["year"]` | ✅ | ✅ |
+| `["year", "month"]` | ✅ | ✅ |
+| `["year", "month", "day"]` | ✅ | ✅ |
+| `["year", "month", "day", "hour"]` | ✅ | ✅ |
+| `["month"]` (no year) | ⚠️ no — month alone is ambiguous across years | ✅ filter on month works |
+| `["year", "week"]` | ⚠️ year only — week isn't directly derivable from a date range | ✅ both columns prune |
+| `["day"]` (no year/month) | ⚠️ no — day-of-month is ambiguous | ✅ filter on day works |
+
+Prefer hierarchical specs (`["year"]`, `["year", "month"]`, `["year", "month", "day"]`) — they line up with the typical batch-pipeline access pattern and prune naturally.
+
+###### Online feature store
+
+By default, the derived partition columns live only in the offline storage; the online feature store does not get them.
+Pass `online_partition_columns=True` to materialize them in the online row as well.
+
+While the online-store filter (the `onlinefs` consumer that drops `offline_only` columns from the RonDB write) is still pending, the backend rejects `partitioned_by` together with `online_enabled=true` and the default `online_partition_columns=false` to avoid writing the grain columns to RonDB by accident.
+The two workarounds: keep the feature group offline-only, or set `online_partition_columns=True` to materialize the grains online explicitly.
+
+###### Hudi
+
+`partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation.
+Hudi needs a different mechanism (a `CustomKeyGenerator` + server-side `Transformer`) and is tracked under a separate follow-up ticket.
+Until that lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes.
+
 ##### Table format
 
 When you create a feature group, you can specify the table format you want to use to store the data in your feature group by setting the `time_travel_format` parameter.

From 6b0c36317e3d698c6f82eaee2b64952b6a4267ef Mon Sep 17 00:00:00 2001
From: Jim Dowling <jim@hopsworks.ai>
Date: Sun, 31 May 2026 15:18:16 +0200
Subject: [PATCH 02/10] [HWORKS-2802] Update partitioned_by docs for the
 real-column design https://hopsworks.atlassian.net/browse/HWORKS-2802

The partitioned_by section described Delta GENERATED ALWAYS AS columns and
storage-engine-side derivation, which is no longer how it works. Document
the real design: the client derives the grain columns from event_time and
writes them as real partition columns, pruning works natively on grain
filters and via predicate translation on event_time ranges. Correct the
online-store note: online-enabled partitioned_by feature groups are
rejected entirely until HWORKS-2808, not only with the default
online_partition_columns.

Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/user_guides/fs/feature_group/create.md | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md
index c7c6a91d0f..8197c9245f 100644
--- a/docs/user_guides/fs/feature_group/create.md
+++ b/docs/user_guides/fs/feature_group/create.md
@@ -104,8 +104,8 @@ By using partitioning the system will write the feature data in different subdir
 
 ##### Time-grain partitioning with `partitioned_by` (Delta only)
 
-When the partition columns are derived from the feature group's `event_time`, the Python client can hand the backend the desired time grains and let the storage engine generate the partition columns automatically.
-Pass `partitioned_by=[...]` with one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`.
+When the partition columns are derived from the feature group's `event_time`, hand the backend the desired time grains with `partitioned_by=[...]` and the Python client derives the partition columns for you.
+Pass one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`.
 
 ```python
 fg = fs.get_or_create_feature_group(
@@ -116,19 +116,20 @@ fg = fs.get_or_create_feature_group(
     partitioned_by=["year", "month", "day"],
     time_travel_format="DELTA",
 )
-fg.insert(df)  # df does not need year/month/day — Delta derives them
+fg.insert(df)  # df does not need year/month/day — the client derives them
 ```
 
 The example above is equivalent to manually decomposing `tx_ts` into three columns and passing `partition_key=["year", "month", "day"]`.
-The backend creates the table via `CREATE TABLE … USING DELTA … GENERATED ALWAYS AS …`, so the derived columns live entirely inside the storage layer; the source dataframe never carries them.
+The grain columns are ordinary materialized partition columns: the client computes them from `event_time` on each write and the backend registers them as partition columns through the normal table-creation path.
+The source dataframe does not need to carry them.
 
 `partitioned_by` and `partition_key` are mutually exclusive.
 `partitioned_by` requires `event_time` to be set.
 
 ###### Partition pruning
 
-Delta auto-derives partition predicates from the GENERATED expressions when the user filters on the source column.
-Filtering on `event_time` ranges therefore prunes partitions for free on hierarchical specs:
+The grain columns are real partition columns, so a filter on a grain column (for example `year == 2026`) prunes partitions natively.
+A filter on an `event_time` range is rewritten into equivalent grain-column predicates by the query layer, so it prunes too on hierarchical specs:
 
 | `partitioned_by` | Prunes on `event_time` range? | Prunes on `year` / `month` / `day` filter? |
 | --- | --- | --- |
@@ -144,11 +145,9 @@ Prefer hierarchical specs (`["year"]`, `["year", "month"]`, `["year", "month", "
 
 ###### Online feature store
 
-By default, the derived partition columns live only in the offline storage; the online feature store does not get them.
-Pass `online_partition_columns=True` to materialize them in the online row as well.
-
-While the online-store filter (the `onlinefs` consumer that drops `offline_only` columns from the RonDB write) is still pending, the backend rejects `partitioned_by` together with `online_enabled=true` and the default `online_partition_columns=false` to avoid writing the grain columns to RonDB by accident.
-The two workarounds: keep the feature group offline-only, or set `online_partition_columns=True` to materialize the grains online explicitly.
+Online-enabled feature groups do not yet support `partitioned_by`.
+The online ingestion path does not exclude the offline-only grain columns from the Kafka/Avro schema, nor materialize them for the online write, so the backend rejects `partitioned_by` together with `online_enabled=true` until that work lands (tracked under a separate follow-up ticket).
+Keep the feature group offline-only to use `partitioned_by`.
 
 ###### Hudi
 

From 00494373b894da3d99750817ca6eb8682ac7b171 Mon Sep 17 00:00:00 2001
From: Jim Dowling <jim@hopsworks.ai>
Date: Wed, 10 Jun 2026 11:09:54 +0200
Subject: [PATCH 03/10] [HWORKS-2802] Drop key-generator detail from the Hudi
 partitioned_by note https://hopsworks.atlassian.net/browse/HWORKS-2802

The Hudi follow-up materializes the grain columns server-side and
partitions on them directly; the CustomKeyGenerator phrasing described
a mechanism the revised design no longer uses.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 docs/user_guides/fs/feature_group/create.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md
index 8197c9245f..97fa30189e 100644
--- a/docs/user_guides/fs/feature_group/create.md
+++ b/docs/user_guides/fs/feature_group/create.md
@@ -152,7 +152,7 @@ Keep the feature group offline-only to use `partitioned_by`.
 ###### Hudi
 
 `partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation.
-Hudi needs a different mechanism (a `CustomKeyGenerator` + server-side `Transformer`) and is tracked under a separate follow-up ticket.
+Hudi materializes the grain columns server-side in the streaming materialization job, and that work is tracked under a separate follow-up ticket.
 Until that lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes.
 
 ##### Table format

From 1dec9e01c46fdfb56935b56924f8c959c05646c7 Mon Sep 17 00:00:00 2001
From: Jim Dowling <jim@hopsworks.ai>
Date: Thu, 11 Jun 2026 06:41:05 +0200
Subject: [PATCH 04/10] [HWORKS-2802] Expand partitioned_by feature group docs
 https://hopsworks.atlassian.net/browse/HWORKS-2802

Flesh out the partitioned_by section into reference for the shipped
feature: the parameter list (partitioned_by + online_partition_columns
with their constraints), cross-session persistence and the round-trip
through get_feature_group, the on-disk Hive layout, a read/partition-
pruning example with the hierarchical-vs-non-hierarchical matrix, a
clickstream-by-hour example, and the current online and Hudi
limitations (online rejected at create and on enable).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 docs/user_guides/fs/feature_group/create.md | 83 ++++++++++++++++-----
 1 file changed, 66 insertions(+), 17 deletions(-)

diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md
index 97fa30189e..d07f7e3cd3 100644
--- a/docs/user_guides/fs/feature_group/create.md
+++ b/docs/user_guides/fs/feature_group/create.md
@@ -104,7 +104,8 @@ By using partitioning the system will write the feature data in different subdir
 
 ##### Time-grain partitioning with `partitioned_by` (Delta only)
 
-When the partition columns are derived from the feature group's `event_time`, hand the backend the desired time grains with `partitioned_by=[...]` and the Python client derives the partition columns for you.
+Most time-series feature groups want to partition by a time grain derived from `event_time`.
+Instead of decomposing the timestamp into `year` / `month` / `day` columns yourself and passing them as `partition_key`, declare the grains with `partitioned_by` and let Hopsworks derive the partition columns for you.
 Pass one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`.
 
 ```python
@@ -116,20 +117,50 @@ fg = fs.get_or_create_feature_group(
     partitioned_by=["year", "month", "day"],
     time_travel_format="DELTA",
 )
-fg.insert(df)  # df does not need year/month/day — the client derives them
+fg.insert(df)  # df does not need year/month/day; they derive from tx_ts
 ```
 
-The example above is equivalent to manually decomposing `tx_ts` into three columns and passing `partition_key=["year", "month", "day"]`.
-The grain columns are ordinary materialized partition columns: the client computes them from `event_time` on each write and the backend registers them as partition columns through the normal table-creation path.
-The source dataframe does not need to carry them.
+The example above is equivalent to manually decomposing `tx_ts` into three columns and passing `partition_key=["year", "month", "day"]`, but you never write the grain columns yourself.
+The grain columns are ordinary materialized partition columns: the client computes them from `event_time` on each write, and the backend registers them as partition columns through the normal table-creation path (no Delta generated columns, no extra job).
+The source DataFrame must contain only your real features plus `event_time`; it must not carry the grain columns.
 
-`partitioned_by` and `partition_key` are mutually exclusive.
-`partitioned_by` requires `event_time` to be set.
+On disk the data lands in the standard Hive layout, one directory level per grain in the order you listed them:
 
-###### Partition pruning
+```text
+.../transactions_1/year=2026/month=06/day=11/<parquet files>
+```
+
+The grains become real features on the feature group, so they show up in the schema and in `fg.partition_key`, and you can filter on them directly.
+By default they are written only to the offline store (see [Online feature store](#online-feature-store) below).
+
+###### Parameters
+
+- `partitioned_by`: ordered, non-empty list of grains from `{"hour", "day", "week", "month", "year"}`, no duplicates.
+  Mutually exclusive with `partition_key`, and requires `event_time` to be set.
+  A grain must not collide with `event_time` or an existing feature name.
+- `online_partition_columns` (default `False`): when `True`, the derived grain columns are also written to the online store; when `False` they are offline-only.
+  Online serving with `partitioned_by` is not supported yet, so this is effectively always `False` today (see below).
+
+###### Persistence across sessions
+
+`partitioned_by` is stored on the feature group, so it round-trips without re-passing it:
 
-The grain columns are real partition columns, so a filter on a grain column (for example `year == 2026`) prunes partitions natively.
-A filter on an `event_time` range is rewritten into equivalent grain-column predicates by the query layer, so it prunes too on hierarchical specs:
+```python
+fg = fs.get_feature_group("transactions", version=1)
+fg.partitioned_by          # ["year", "month", "day"]
+fg.partition_key           # ["year", "month", "day"]
+```
+
+###### Reading and partition pruning
+
+Read the whole group, or a time slice; the grain columns appear as normal feature columns, populated from `event_time`:
+
+```python
+recent = fg.read(start_time="2026-06-01", end_time="2026-06-11")
+```
+
+The grain columns are real partition columns, so a filter on a grain column (for example `fg.filter(fg.year == 2026)`) prunes partitions natively.
+A filter on an `event_time` range is rewritten into equivalent grain-column predicates by the query layer, so `fg.read(start_time=..., end_time=...)` prunes too on hierarchical specs (and tightens to the finest grain the range allows, so a within-one-month window also bounds `day`):
 
 | `partitioned_by` | Prunes on `event_time` range? | Prunes on `year` / `month` / `day` filter? |
 | --- | --- | --- |
@@ -137,23 +168,41 @@ A filter on an `event_time` range is rewritten into equivalent grain-column pred
 | `["year", "month"]` | ✅ | ✅ |
 | `["year", "month", "day"]` | ✅ | ✅ |
 | `["year", "month", "day", "hour"]` | ✅ | ✅ |
-| `["month"]` (no year) | ⚠️ no — month alone is ambiguous across years | ✅ filter on month works |
-| `["year", "week"]` | ⚠️ year only — week isn't directly derivable from a date range | ✅ both columns prune |
-| `["day"]` (no year/month) | ⚠️ no — day-of-month is ambiguous | ✅ filter on day works |
+| `["month"]` (no year) | ⚠️ no, month alone is ambiguous across years | ✅ filter on month works |
+| `["year", "week"]` | ⚠️ year only, week is not directly derivable from a date range | ✅ both columns prune |
+| `["day"]` (no year/month) | ⚠️ no, day-of-month is ambiguous | ✅ filter on day works |
+
+Prefer hierarchical specs: `["year"]`, `["year", "month"]`, `["year", "month", "day"]`, `["year", "month", "day", "hour"]`.
+They line up with the typical batch-pipeline access pattern and prune naturally on both grain-column and `event_time`-range filters.
+Non-hierarchical specs are still valid; they just do not prune on an `event_time` range, only on a direct filter of the derived columns.
 
-Prefer hierarchical specs (`["year"]`, `["year", "month"]`, `["year", "month", "day"]`) — they line up with the typical batch-pipeline access pattern and prune naturally.
+###### Example: clickstream partitioned by the hour
+
+A high-volume event stream partitioned down to the hour, so a query for a few hours reads only those partitions:
+
+```python
+fg = fs.get_or_create_feature_group(
+    name="clickstream",
+    version=1,
+    primary_key=["event_id"],
+    event_time="event_time",
+    partitioned_by=["year", "month", "day", "hour"],
+    online_enabled=False,
+    time_travel_format="DELTA",
+)
+fg.insert(clickstream_df)  # only event_id / event_time / event fields
+```
 
 ###### Online feature store
 
 Online-enabled feature groups do not yet support `partitioned_by`.
-The online ingestion path does not exclude the offline-only grain columns from the Kafka/Avro schema, nor materialize them for the online write, so the backend rejects `partitioned_by` together with `online_enabled=true` until that work lands (tracked under a separate follow-up ticket).
+The online ingestion path does not exclude the offline-only grain columns from the Kafka/Avro schema, nor materialize them for the online write, so the backend rejects `partitioned_by` together with `online_enabled=True`, both at creation and when enabling online on an existing group.
 Keep the feature group offline-only to use `partitioned_by`.
 
 ###### Hudi
 
 `partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation.
-Hudi materializes the grain columns server-side in the streaming materialization job, and that work is tracked under a separate follow-up ticket.
-Until that lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes.
+Until Hudi support lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes.
 
 ##### Table format
 

From 49db202ff6afb31a1117aedd1b8993d333a70cb3 Mon Sep 17 00:00:00 2001
From: Jim Dowling <jim@hopsworks.ai>
Date: Sat, 13 Jun 2026 00:17:48 +0200
Subject: [PATCH 05/10] [HWORKS-2807] Document partitioned_by support on
 Iceberg https://hopsworks.atlassian.net/browse/HWORKS-2807

partitioned_by now works on DELTA and ICEBERG; NONE is rejected alongside
Hudi. Update the section heading, supported-formats note, and the Hudi
fallback guidance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 docs/user_guides/fs/feature_group/create.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md
index 002ce4879f..367224968d 100644
--- a/docs/user_guides/fs/feature_group/create.md
+++ b/docs/user_guides/fs/feature_group/create.md
@@ -102,11 +102,12 @@ MaxDirectoryItemsExceededException - The directory item limit is exceeded: limit
 
 By using partitioning the system will write the feature data in different subdirectories, thus allowing you to write 10240 files per partition.
 
-##### Time-grain partitioning with `partitioned_by` (Delta only)
+##### Time-grain partitioning with `partitioned_by` (Delta and Iceberg)
 
 Most time-series feature groups want to partition by a time grain derived from `event_time`.
 Instead of decomposing the timestamp into `year` / `month` / `day` columns yourself and passing them as `partition_key`, declare the grains with `partitioned_by` and let Hopsworks derive the partition columns for you.
 Pass one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`.
+Supported on `time_travel_format="DELTA"` and `time_travel_format="ICEBERG"`.
 
 ```python
 fg = fs.get_or_create_feature_group(
@@ -201,8 +202,8 @@ Keep the feature group offline-only to use `partitioned_by`.
 
 ###### Hudi
 
-`partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation.
-Until Hudi support lands, use `time_travel_format="DELTA"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes.
+`partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation; so is `time_travel_format="NONE"` (plain Hive/parquet), which has no grain-materialization step.
+Until Hudi support lands, use `time_travel_format="DELTA"` or `"ICEBERG"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes.
 
 ##### Table format
 

From c2e8830da78850e058dbfb63906af37d28cc40c4 Mon Sep 17 00:00:00 2001
From: Jim Dowling <jim@hopsworks.ai>
Date: Sat, 13 Jun 2026 07:31:35 +0200
Subject: [PATCH 06/10] [HWORKS-2807] Document partitioned_by on Hudi + stream
 limitation https://hopsworks.atlassian.net/browse/HWORKS-2807

Non-stream Hudi feature groups now support partitioned_by (direct Spark
write); stream feature groups and NONE are rejected. Update the section
heading, supported-formats note, Hudi note, and add a stream note.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 docs/user_guides/fs/feature_group/create.md | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md
index 367224968d..f985c96472 100644
--- a/docs/user_guides/fs/feature_group/create.md
+++ b/docs/user_guides/fs/feature_group/create.md
@@ -102,12 +102,12 @@ MaxDirectoryItemsExceededException - The directory item limit is exceeded: limit
 
 By using partitioning the system will write the feature data in different subdirectories, thus allowing you to write 10240 files per partition.
 
-##### Time-grain partitioning with `partitioned_by` (Delta and Iceberg)
+##### Time-grain partitioning with `partitioned_by`
 
 Most time-series feature groups want to partition by a time grain derived from `event_time`.
 Instead of decomposing the timestamp into `year` / `month` / `day` columns yourself and passing them as `partition_key`, declare the grains with `partitioned_by` and let Hopsworks derive the partition columns for you.
 Pass one or more grains drawn from `hour`, `day`, `week`, `month`, and `year`.
-Supported on `time_travel_format="DELTA"` and `time_travel_format="ICEBERG"`.
+Supported on `time_travel_format="DELTA"`, `"ICEBERG"`, and `"HUDI"` for non-stream feature groups (see [Hudi](#hudi) and [Stream feature groups](#stream-feature-groups) below).
 
 ```python
 fg = fs.get_or_create_feature_group(
@@ -202,8 +202,15 @@ Keep the feature group offline-only to use `partitioned_by`.
 
 ###### Hudi
 
-`partitioned_by` on `time_travel_format="HUDI"` feature groups is not yet supported and the backend rejects it at creation; so is `time_travel_format="NONE"` (plain Hive/parquet), which has no grain-materialization step.
-Until Hudi support lands, use `time_travel_format="DELTA"` or `"ICEBERG"` to get time-grain partitioning, or partition Hudi groups explicitly via `partition_key=["year"]` with a `year` column the upstream pipeline computes.
+`partitioned_by` works on Hudi feature groups written directly by Spark (a non-stream feature group): the client materializes the grain columns and Hudi partitions on them.
+On the Python (non-Spark) engine a Hudi feature group is created as a stream feature group, which is not yet supported (see below); use `time_travel_format="DELTA"` or `"ICEBERG"` there.
+`time_travel_format="NONE"` (plain Hive/parquet) is rejected because it has no grain-materialization step.
+
+###### Stream feature groups
+
+`partitioned_by` is not yet supported on stream feature groups (`stream=True`).
+Stream feature groups materialize through the DeltaStreamer job, which does not derive the grain columns yet, so the backend rejects `partitioned_by` on them at creation.
+Create a non-stream feature group to use `partitioned_by`.
 
 ##### Table format
 

From c75b357ec776fae531d7338e8ad7ea810a1df14a Mon Sep 17 00:00:00 2001
From: Jim Dowling <jim@hopsworks.ai>
Date: Mon, 15 Jun 2026 10:43:04 +0200
Subject: [PATCH 07/10] [HWORKS-2802] docs: hour-grain timestamp rule and
 partitioned_by in feature views
 https://hopsworks.atlassian.net/browse/HWORKS-2802

Document that the hour grain requires a timestamp event_time (rejected on
a date event_time), and that a feature view may select the derived grain
columns even when it joins online-enabled feature groups: the grains are
served from the offline store (training data, batch inference) and
excluded from the online feature vector.

Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/user_guides/fs/feature_group/create.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md
index f985c96472..6cea717e8b 100644
--- a/docs/user_guides/fs/feature_group/create.md
+++ b/docs/user_guides/fs/feature_group/create.md
@@ -139,6 +139,7 @@ By default they are written only to the offline store (see [Online feature store
 - `partitioned_by`: ordered, non-empty list of grains from `{"hour", "day", "week", "month", "year"}`, no duplicates.
   Mutually exclusive with `partition_key`, and requires `event_time` to be set.
   A grain must not collide with `event_time` or an existing feature name.
+  The `hour` grain requires a `timestamp` `event_time`; it is rejected on a `date` `event_time`, which has no sub-day resolution.
 - `online_partition_columns` (default `False`): when `True`, the derived grain columns are also written to the online store; when `False` they are offline-only.
   Online serving with `partitioned_by` is not supported yet, so this is effectively always `False` today (see below).
 
@@ -200,6 +201,9 @@ Online-enabled feature groups do not yet support `partitioned_by`.
 The online ingestion path does not exclude the offline-only grain columns from the Kafka/Avro schema, nor materialize them for the online write, so the backend rejects `partitioned_by` together with `online_enabled=True`, both at creation and when enabling online on an existing group.
 Keep the feature group offline-only to use `partitioned_by`.
 
+A feature view may still select the derived grain columns even when it also joins online-enabled feature groups.
+The grains are served from the offline store, so they appear in training data and batch inference, and they are excluded from the online feature vector, since online serving reads only the online store.
+
 ###### Hudi
 
 `partitioned_by` works on Hudi feature groups written directly by Spark (a non-stream feature group): the client materializes the grain columns and Hudi partitions on them.

From a977d113a4708e1d13479a35a9bdd9ad27a9ac1d Mon Sep 17 00:00:00 2001
From: Jim Dowling <jim@hopsworks.ai>
Date: Mon, 15 Jun 2026 11:56:46 +0200
Subject: [PATCH 08/10] [HWORKS-2802] docs: note the Table DDL card in the
 feature group UI https://hopsworks.atlassian.net/browse/HWORKS-2802

Document that the feature group overview shows a Table DDL card with the
Spark SQL CREATE TABLE for the offline table (format + partition columns)
and the RonDB CREATE TABLE for the online table when online-enabled.

Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/user_guides/fs/feature_group/create.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md
index 6cea717e8b..6839b94374 100644
--- a/docs/user_guides/fs/feature_group/create.md
+++ b/docs/user_guides/fs/feature_group/create.md
@@ -222,6 +222,8 @@ When you create a feature group, you can specify the table format you want to us
 The currently supported values are `"HUDI"`, `"DELTA"`, `"ICEBERG"`, and `"NONE"` (which stores as Parquet without time travel support).
 The parameter defaults to `"DELTA"`.
 
+The feature group overview in the UI shows a **Table DDL** card with the generated Spark SQL `CREATE TABLE` statement for the offline table (including the table format and any partition columns), and, for online-enabled feature groups, the `CREATE TABLE` statement for the online (RonDB) table.
+
 ##### Data Source
 
 During the creation of a feature group, it is possible to define the `data_source` parameter, this allows for management of offline data in the desired table format outside the Hopsworks cluster.

From 207cf7772a9fc534ac87859b509815e275c90925 Mon Sep 17 00:00:00 2001
From: Jim Dowling <jim@hopsworks.ai>
Date: Thu, 18 Jun 2026 14:00:37 +0200
Subject: [PATCH 09/10] [HWORKS-2802] Correct feature-view serving behavior for
 partitioned_by grains https://hopsworks.atlassian.net/browse/HWORKS-2802

Selecting a derived partitioned_by grain column into a feature view does
not silently exclude it from the online vector: get_feature_vector and
get_feature_vectors raise a FeatureStoreException, and feature-view
creation warns when the view also joins an online-enabled feature group.
The grains remain available offline (training data, batch inference).

Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/user_guides/fs/feature_group/create.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md
index 6839b94374..e1645a926b 100644
--- a/docs/user_guides/fs/feature_group/create.md
+++ b/docs/user_guides/fs/feature_group/create.md
@@ -201,8 +201,10 @@ Online-enabled feature groups do not yet support `partitioned_by`.
 The online ingestion path does not exclude the offline-only grain columns from the Kafka/Avro schema, nor materialize them for the online write, so the backend rejects `partitioned_by` together with `online_enabled=True`, both at creation and when enabling online on an existing group.
 Keep the feature group offline-only to use `partitioned_by`.
 
-A feature view may still select the derived grain columns even when it also joins online-enabled feature groups.
-The grains are served from the offline store, so they appear in training data and batch inference, and they are excluded from the online feature vector, since online serving reads only the online store.
+A feature view may still select the derived grain columns; they appear in training data and batch inference, read from the offline store.
+They cannot be served online, however: the grain columns live only in the offline store, so [`FeatureView.get_feature_vector`][hsfs.feature_view.FeatureView.get_feature_vector] and [`FeatureView.get_feature_vectors`][hsfs.feature_view.FeatureView.get_feature_vectors] raise a `FeatureStoreException` when the feature view selects a derived grain column.
+When such a feature view also joins an online-enabled feature group, a warning is raised at feature-view creation to flag that the selected grain columns will not be retrievable online.
+To serve a feature view online, do not select the derived grain columns into it.
 
 ###### Hudi
 

From 12e9c7358756a8d2439c8404a6eef5503cbd95b3 Mon Sep 17 00:00:00 2001
From: Jim Dowling <jim@hopsworks.ai>
Date: Thu, 18 Jun 2026 15:23:05 +0200
Subject: [PATCH 10/10] [HWORKS-2802] Clarify clickstream example is a
 non-stream feature group https://hopsworks.atlassian.net/browse/HWORKS-2802

Reword the hourly-partitioning example so "clickstream" is not mistaken
for a stream feature group (stream=True), which partitioned_by does not
yet support.

Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 docs/user_guides/fs/feature_group/create.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/user_guides/fs/feature_group/create.md b/docs/user_guides/fs/feature_group/create.md
index e1645a926b..518acdb760 100644
--- a/docs/user_guides/fs/feature_group/create.md
+++ b/docs/user_guides/fs/feature_group/create.md
@@ -180,7 +180,7 @@ Non-hierarchical specs are still valid; they just do not prune on an `event_time
 
 ###### Example: clickstream partitioned by the hour
 
-A high-volume event stream partitioned down to the hour, so a query for a few hours reads only those partitions:
+A high-volume clickstream feature group partitioned down to the hour, so a query for a few hours reads only those partitions (this is a regular non-stream feature group; the name refers to the click-event data, not to `stream=True`):
 
 ```python
 fg = fs.get_or_create_feature_group(