add broker config options for sql log redaction#18430
add broker config options for sql log redaction#18430jadami10 wants to merge 5 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18430 +/- ##
============================================
+ Coverage 63.61% 63.65% +0.04%
- Complexity 1717 1735 +18
============================================
Files 3252 3254 +2
Lines 199051 199501 +450
Branches 30838 30984 +146
============================================
+ Hits 126618 126993 +375
- Misses 62352 62370 +18
- Partials 10081 10138 +57
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xiangfu0
left a comment
There was a problem hiding this comment.
Found two high-signal SQL redaction gaps; see inline comments.
| return valueOf(value.toUpperCase()); | ||
| } catch (IllegalArgumentException e) { | ||
| LOGGER.warn("Invalid SQL redaction mode '{}', defaulting to NONE", value); | ||
| return NONE; |
There was a problem hiding this comment.
This fails open on misconfiguration. If an operator sets an invalid pinot.broker.query.log.sqlRedaction value, we silently fall back to NONE and start emitting raw SQL, which is the exact unsafe behavior this knob is supposed to prevent. For a privacy feature, the safer behavior is to reject startup or fail closed to a redacted mode instead of disabling redaction.
There was a problem hiding this comment.
ya, really good point. I've updated this for now while I think about your other comment.
| @@ -332,6 +338,8 @@ protected BrokerResponse handleRequest(long requestId, String query, SqlNodeAndO | |||
| } | |||
| } | |||
There was a problem hiding this comment.
If fingerprint generation failed above, this still hands the raw SQL to the query logger and the request-handler warning path already logged it once. The same pattern also exists on other broker error paths that still log query directly, so literal_values and especially full do not actually guarantee that SQL stays out of broker logs. We need a shared redaction helper for every broker-side query log before advertising this as broker SQL redaction.
There was a problem hiding this comment.
another great catch. From what I initially found, the queries are all being logged from BaseSingleStageBrokerRequestHandler and MultiStageBrokerRequestHandler. My thinking is to start by exposing redactQuery as a method on QueryLogger and have both classes use that. This minimizes the amount of changes and doesn't require a global redaction config that all classes need access to right. It does leave things open to this pattern if needed in the future.
What do you think?
This is both a bugfix and a new feature to support query redaction.
By default, query logs are not redacted.
With
literal_values, we use the the query fingerprint to only log the redacted query with no literal values. This is useful if folks still want the structure of the query without potentially leaking PII.This also fixes a bug where query fingerprinting was modifying the AST in place and breaking queries. This closes #18426.
The final option is full redaction. This is good if you want no SQL ending up in your logging system.
I tested all options internally on a QA cluster. We plan to stick with
fullredaction going forward.