Skip to content

Normalize legacy FieldConfig indexes into indexes#18412

Open
xiangfu0 wants to merge 1 commit intoapache:masterfrom
xiangfu0:codex/fieldconfig-indexes-normalization
Open

Normalize legacy FieldConfig indexes into indexes#18412
xiangfu0 wants to merge 1 commit intoapache:masterfrom
xiangfu0:codex/fieldconfig-indexes-normalization

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented May 4, 2026

Summary

  • remove _indexTypes as stored FieldConfig state and normalize legacy indexType/indexTypes into generic indexes entries at construction time
  • move legacy TEXT, H3, VECTOR, and RANGE config reconstruction into their respective index type deserializers instead of materializing index-specific configs inside FieldConfig
  • keep table-level legacy fallbacks where they already belong, including range-index version fallback and legacy indexing-config based indexes

User Manual

Pinot still accepts legacy fieldConfigList entries that use indexType or indexTypes.

After this change:

  • legacy indexType and indexTypes are converted into indexes during deserialization
  • FieldConfig only records generic enabled index membership in indexes; each index type remains responsible for interpreting any legacy per-index properties
  • serialized table configs only emit indexes; Pinot no longer writes back indexType or indexTypes
  • getIndexType() and getIndexTypes() continue to work as derived compatibility accessors backed by normalized indexes
  • legacy range index configs still honor the table-level range index version when no per-field version is set

Sample Table Config

Legacy input:

{
  "fieldConfigList": [
    {
      "name": "message",
      "encodingType": "RAW",
      "indexType": "TEXT",
      "properties": {
        "stopWordInclude": "pinot,apache",
        "enableQueryCacheForTextIndex": "true"
      }
    },
    {
      "name": "cityId",
      "indexTypes": ["INVERTED", "RANGE"]
    }
  ]
}

Normalized form written back by Pinot:

{
  "fieldConfigList": [
    {
      "name": "message",
      "encodingType": "RAW",
      "indexes": {
        "text": {
          "disabled": false,
          "queryCache": true,
          "stopWordsInclude": ["pinot", "apache"]
        }
      }
    },
    {
      "name": "cityId",
      "encodingType": "DICTIONARY",
      "indexes": {
        "inverted": {
          "disabled": false
        },
        "range": {
          "disabled": false
        }
      }
    }
  ]
}

Testing

  • ./mvnw spotless:apply -pl pinot-spi,pinot-segment-spi,pinot-common,pinot-segment-local -am
  • ./mvnw checkstyle:check -pl pinot-spi,pinot-segment-spi,pinot-common,pinot-segment-local -am
  • ./mvnw license:format -pl pinot-spi,pinot-segment-spi,pinot-common,pinot-segment-local -am
  • ./mvnw license:check -pl pinot-spi,pinot-segment-spi,pinot-common,pinot-segment-local -am
  • ./mvnw -pl pinot-common,pinot-segment-local,pinot-core -am -Dtest=TableConfigSerDeUtilsTest,TextIndexTest,RangeIndexTest,InvertedIndexTypeTest,H3IndexTest,VectorIndexTest,JsonIndexTest,FstIndexTypeTest,TextSearchQueriesTest,H3IndexQueriesTest -Dsurefire.failIfNoSpecifiedTests=false test

@xiangfu0 xiangfu0 added configuration Config changes (addition/deletion/change in behavior) index Related to indexing (general) serialization Related to data serialization and deserialization ready-for-review PR is ready for maintainer review labels May 4, 2026
@xiangfu0 xiangfu0 force-pushed the codex/fieldconfig-indexes-normalization branch 3 times, most recently from 81e5521 to c6440c8 Compare May 4, 2026 08:31
@xiangfu0 xiangfu0 force-pushed the codex/fieldconfig-indexes-normalization branch from c6440c8 to 53d023b Compare May 4, 2026 08:37
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (22df831) to head (53d023b).
⚠️ Report is 51 commits behind head on master.

Additional details and impacted files
@@              Coverage Diff               @@
##             master    #18412       +/-   ##
==============================================
+ Coverage     63.40%   100.00%   +36.59%     
+ Complexity     1668         6     -1662     
==============================================
  Files          3252         3     -3249     
  Lines        198661         6   -198655     
  Branches      30770         0    -30770     
==============================================
- Hits         125965         6   -125959     
+ Misses        62632         0    -62632     
+ Partials      10064         0    -10064     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 100.00% <ø> (+36.59%) ⬆️
temurin 100.00% <ø> (+36.59%) ⬆️
unittests ?
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor Author

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one high-signal backward-compatibility issue; see inline comment.

}

@Deprecated
@JsonIgnore
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding @JsonIgnore here removes indexType / indexTypes from every FieldConfig serialized through controller APIs and TableConfig#toJsonString(). Those fields are already part of the public JSON contract, so older clients that still round-trip or inspect them will see a backward-incompatible response-shape change. Please keep emitting the legacy fields for now, even if you normalize internally to indexes, and deprecate them over a compatibility window.

Copy link
Copy Markdown
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove "disabled": false. It is always implicit when an index is configured

_compressionCodec = compressionCodec;
_timestampConfig = timestampConfig;
_properties = properties;
_indexes = indexes == null ? NullNode.getInstance() : indexes;
_indexes = normalizeIndexes(name, indexType, indexTypes, indexes);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalizing in constructor could be potentially expensive. We load TableConfig everywhere, but only very few uses FieldConfig. Normalizing during usage should be better.

For new/updated table configs, we can normalize before writing to ZK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Config changes (addition/deletion/change in behavior) index Related to indexing (general) ready-for-review PR is ready for maintainer review serialization Related to data serialization and deserialization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants