perf(rule): cache parsed nikunjy targeting query evaluators#5174
perf(rule): cache parsed nikunjy targeting query evaluators#5174Sha-x2-nk wants to merge 4 commits intothomaspoignant:mainfrom
Conversation
The default rule evaluator path calls parser.Evaluate(query, mapCtx), which under the hood runs parser.NewEvaluator on every call — a full ANTLR lex+parse of the query string. For a flag with N targeting rules that's N parses per evaluation, dominating both CPU and allocations. Cache one parser.Evaluator per query string in a package-level sync.Map. Because parser.Evaluator.Process writes to internal state (lastDebugErr), sharing a single instance across goroutines would race; instead each cache entry holds a sync.Pool of Evaluators for the same query, so parsing is amortized and concurrent callers don't serialize. Benchmark on a 6-predicate query (modules/core/flag/rule_bench_test.go): before: 19608 ns/op 21344 B/op 301 allocs/op after: 2105 ns/op 1198 B/op 36 allocs/op Single-predicate query: before: 3628 ns/op 5192 B/op 85 allocs/op after: 787 ns/op 865 B/op 19 allocs/op No behavior change. All flag-package tests pass with -race.
✅ Deploy Preview for go-feature-flag-doc-preview canceled.
|
There was a problem hiding this comment.
Code Review
This pull request introduces a caching mechanism for nikunjy rule evaluations using sync.Map and sync.Pool to memoize parsed evaluators, which optimizes performance by avoiding repeated ANTLR parsing. It also adds benchmarks to verify these improvements. Review feedback highlights a missing fmt import and suggests simplifying logging by using slog.Error instead of slog.ErrorContext and passing error objects directly to the logger.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5174 +/- ##
==========================================
- Coverage 85.89% 85.84% -0.05%
==========================================
Files 157 157
Lines 6662 6696 +34
==========================================
+ Hits 5722 5748 +26
- Misses 708 713 +5
- Partials 232 235 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
thomaspoignant
left a comment
There was a problem hiding this comment.
@Sha-x2-nk thanks a ton for this pull request, the performance gain is huge.
I am just worried about not having any limits in the cache, because every config change will add on top and it can leak.
What about using a LRU cache instead here?
| // Memory note: cache entries are not evicted, since a feature flag's query strings | ||
| // are part of static config and the set is bounded in practice. |
There was a problem hiding this comment.
❔ question: I am worried about this, since we are able to change to configuration on run time, we will have flag config edits add new query strings.
It means that this is dynamic and it can grow at every configuration changes.
I am worried that long-lived process with many config reloads leaks.
I am curious to have your point of view on this?



Description
What was the problem?
For nikunjy-format targeting queries (the default),
evaluateRuleinmodules/core/flag/rule.gocallsparser.Evaluate(query, mapCtx). Under the hood that constructs a freshparser.Evaluatoron every invocation, which runs the full ANTLR lex + parse over the query string. For a flag with N targeting rules, that's N ANTLR parses per evaluation, dominating both CPU and allocations.CPU/alloc profiling on a flag with ~13 targeting rules showed roughly 2,000+ allocations per evaluation, with the alloc graph dominated by
antlr4-go/antlr/v4.NewCommonToken,antlr4-go/antlr/v4.(*BaseLexer).Emit,nikunjy/rules/parser.NewQueryContext, and similar parser/lexer constructor calls.How is it resolved?
Cache one
*parser.Evaluatorper distinct query string in a package-levelsync.Map. Becauseparser.Evaluator.Processwrites to an internallastDebugErrfield, sharing a single instance across goroutines would trigger Go's race detector. Each cache entry instead holds a smallsync.Poolof*parser.Evaluatorfor the same query, so:Rule.Evaluateor any public API.The cache is keyed only by the query string.
parser.NewEvaluatoris deterministic in its input, so this is safe. Cache entries are not evicted; in practice the set of distinct query strings is bounded by the flag config and is much smaller than the request volume.How can we test the change?
Existing flag-package tests cover correctness (
go test -race ./modules/core/flag/...passes locally).A new file
modules/core/flag/rule_bench_test.goadds two benchmarks. Measured on darwin/arm64 (Apple M2 Pro), Go 1.25:6-predicate query (
language eq "ar" and isNewUser eq true and clubsTimeSpent gt 600 and clubsTimeSpent le 3600 and concurrencyLocked eq true and segment eq 0):Single-predicate query (
language eq "ar"):The improvement scales with the size of the query and the number of rules per flag.
Breaking changes
None. Public API unchanged; behavior identical.
Closes issue(s)
N/A (filing this directly as a perf improvement; happy to file a tracking issue if preferred).
Checklist
go test -race ./modules/core/flag/...,make lint)README.mdand/website/docs) — not applicable, internal-only change