Skip to content

add broker config options for sql log redaction#18430

Open
jadami10 wants to merge 5 commits intoapache:masterfrom
jadami10:jadami/oss-redact-query-sql
Open

add broker config options for sql log redaction#18430
jadami10 wants to merge 5 commits intoapache:masterfrom
jadami10:jadami/oss-redact-query-sql

Conversation

@jadami10
Copy link
Copy Markdown
Contributor

@jadami10 jadami10 commented May 5, 2026

This is both a bugfix and a new feature to support query redaction.

By default, query logs are not redacted.

With literal_values, we use the the query fingerprint to only log the redacted query with no literal values. This is useful if folks still want the structure of the query without potentially leaking PII.

This also fixes a bug where query fingerprinting was modifying the AST in place and breaking queries. This closes #18426.

The final option is full redaction. This is good if you want no SQL ending up in your logging system.

I tested all options internally on a QA cluster. We plan to stick with full redaction going forward.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 5, 2026

Codecov Report

❌ Patch coverage is 31.37255% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.65%. Comparing base (b870804) to head (871a7e2).
⚠️ Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
...sthandler/BaseSingleStageBrokerRequestHandler.java 14.81% 43 Missing and 3 partials ⚠️
...requesthandler/MultiStageBrokerRequestHandler.java 9.09% 17 Missing and 3 partials ⚠️
.../org/apache/pinot/broker/querylog/QueryLogger.java 80.95% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18430      +/-   ##
============================================
+ Coverage     63.61%   63.65%   +0.04%     
- Complexity     1717     1735      +18     
============================================
  Files          3252     3254       +2     
  Lines        199051   199501     +450     
  Branches      30838    30984     +146     
============================================
+ Hits         126618   126993     +375     
- Misses        62352    62370      +18     
- Partials      10081    10138      +57     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.65% <31.37%> (+0.04%) ⬆️
temurin 63.65% <31.37%> (+0.04%) ⬆️
unittests 63.65% <31.37%> (+0.04%) ⬆️
unittests1 55.72% <100.00%> (+0.07%) ⬆️
unittests2 34.97% <31.37%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found two high-signal SQL redaction gaps; see inline comments.

return valueOf(value.toUpperCase());
} catch (IllegalArgumentException e) {
LOGGER.warn("Invalid SQL redaction mode '{}', defaulting to NONE", value);
return NONE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails open on misconfiguration. If an operator sets an invalid pinot.broker.query.log.sqlRedaction value, we silently fall back to NONE and start emitting raw SQL, which is the exact unsafe behavior this knob is supposed to prevent. For a privacy feature, the safer behavior is to reject startup or fail closed to a redacted mode instead of disabling redaction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, really good point. I've updated this for now while I think about your other comment.

@@ -332,6 +338,8 @@ protected BrokerResponse handleRequest(long requestId, String query, SqlNodeAndO
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fingerprint generation failed above, this still hands the raw SQL to the query logger and the request-handler warning path already logged it once. The same pattern also exists on other broker error paths that still log query directly, so literal_values and especially full do not actually guarantee that SQL stays out of broker logs. We need a shared redaction helper for every broker-side query log before advertising this as broker SQL redaction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another great catch. From what I initially found, the queries are all being logged from BaseSingleStageBrokerRequestHandler and MultiStageBrokerRequestHandler. My thinking is to start by exposing redactQuery as a method on QueryLogger and have both classes use that. This minimizes the amount of changes and doesn't require a global redaction config that all classes need access to right. It does leave things open to this pattern if needed in the future.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

query fingerprinting mutates the SqlNode in place and breaks queries

4 participants