Skip to content

fix(mcp): relax column name regex, improve generate_chart validation errors and examples#39915

Draft
aminghadersohi wants to merge 6 commits intoapache:masterfrom
aminghadersohi:amin/fix-generate-chart-column-name-validation
Draft

fix(mcp): relax column name regex, improve generate_chart validation errors and examples#39915
aminghadersohi wants to merge 6 commits intoapache:masterfrom
aminghadersohi:amin/fix-generate-chart-column-name-validation

Conversation

@aminghadersohi
Copy link
Copy Markdown
Contributor

Summary

Addresses validation rigidity in the generate_chart MCP tool that caused unnecessary failures when using valid but unconventionally-named columns.

Changes:

  1. Relax column name regex — Remove the pattern=r"^[a-zA-Z0-9_][a-zA-Z0-9_\s\-\.]*$" constraint from ColumnRef.name, FilterConfig.column, and BigNumberChartConfig.temporal_column. Many real-world column names (digit-prefixed like 1Q_revenue, hyphenated like order-date) were silently rejected with cryptic pydantic errors. The existing sanitize_name() / sanitize_column() validators already block XSS and SQL injection — the regex added no security value and only hurt usability.

  2. Add sanitize_temporal_column validatorBigNumberChartConfig.temporal_column now has a field_validator using sanitize_user_input with check_sql_keywords=True, matching the protection level of ColumnRef.sanitize_name.

  3. Add check_sql_keywords=True to FilterConfig.sanitize_column — ensures SQL injection patterns are blocked for filter column names.

  4. Extend docstring examples — Add generate_chart usage examples for all supported chart types: pie, big_number (with and without trendline), pivot_table, mixed_timeseries, handlebars. Update IMPORTANT section to list all 7 supported chart types.

  5. Improve validation error messages — Extract _format_single_error helper from _enhance_validation_error (reduces cyclomatic complexity) and make the fallback produce type-specific, actionable messages for string_pattern_mismatch, missing, and value_error pydantic error types. literal_error preserves the original pydantic "Input should be ..." message.

  6. Tests — New TestColumnRefNameRelaxedPattern and TestFilterConfigColumnRelaxedPattern classes verify: digit-prefixed and hyphenated column names now pass; script-tag XSS is blocked (nh3 strips to empty, empty-value guard rejects); event-handler injection is blocked; SQL injection is blocked; FilterConfig SQL injection is blocked.

Testing

  • Unit tests: pytest tests/unit_tests/mcp_service/chart/test_chart_schemas.py -x
  • Manual: generate_chart with a column named 1Q_revenue or order-date succeeds

…errors and examples

- Remove overly strict regex pattern from ColumnRef.name, FilterConfig.column,
  and BigNumberChartConfig.temporal_column — sanitize_name/sanitize_column
  already handle XSS/SQL injection; the pattern rejected valid column names
  like "1Q_revenue" (digit-prefixed) or "order-date" (hyphenated)
- Extend generate_chart docstring with usage examples for all supported chart
  types: pie, big_number (with/without trendline), pivot_table,
  mixed_timeseries, handlebars
- Improve _enhance_validation_error fallback in SchemaValidator to produce
  type-specific, actionable messages instead of raw pydantic error strings
  (extract _format_single_error helper to reduce cyclomatic complexity)
- Add tests verifying digit-prefixed/hyphenated column names now pass,
  and that XSS/SQL injection is still blocked by sanitize_name()
- FilterConfig.column: add check_sql_keywords=True to sanitize_column
  (Copilot review: sanitize_column was missing SQL keyword checking)
- BigNumberChartConfig.temporal_column: add sanitize_temporal_column
  field_validator using sanitize_user_input with check_sql_keywords=True
  (Copilot review: no validator after regex removal left field unprotected)
- generate_chart docstring IMPORTANT: list all chart types, not just xy/table
  (Copilot review: IMPORTANT section was misleading after adding more examples)
- Fix test_xss_attempt_blocked: nh3 strips HTML tags instead of rejecting,
  so rename to test_xss_tags_are_stripped (asserts tag is removed) and add
  test_event_handler_injection_blocked (on...= patterns ARE rejected)
- Fix _format_single_error literal_error: preserve pydantic 'Input should be'
  message instead of replacing with custom format (broke existing test
  test_non_value_error_pydantic_body_is_surfaced)
- Add test_sql_injection_in_filter_column_blocked to verify FilterConfig
  now rejects SQL injection column names
- Remove unused 'type: ignore[return-value]' from sanitize_temporal_column
  (mypy correctly infers the return type; comment was unnecessary)
- Fix test_xss_tags_are_stripped → test_script_tag_blocked: nh3 strips the
  entire script element including its content, leaving an empty string that
  the allow_empty=False guard then rejects with ValidationError
@netlify
Copy link
Copy Markdown

netlify Bot commented May 6, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit a66b8d6
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/69fba671977b540008fcaa65
😎 Deploy Preview https://deploy-preview-39915--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

…ner text

nh3.clean() removes HTML tag delimiters but preserves the text content
between them, so '<script>alert(1)</script>' becomes 'alert(1)' rather
than an empty string. Update the test to assert the tag is stripped
(not that a ValidationError is raised).
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 10.86957% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.87%. Comparing base (673634f) to head (24ad34f).
⚠️ Report is 37 commits behind head on master.

Files with missing lines Patch % Lines
...t/mcp_service/chart/validation/schema_validator.py 9.09% 20 Missing ⚠️
superset/mcp_service/chart/tool/generate_chart.py 0.00% 17 Missing ⚠️
superset/mcp_service/chart/schemas.py 42.85% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39915      +/-   ##
==========================================
- Coverage   64.37%   63.87%   -0.50%     
==========================================
  Files        2569     2583      +14     
  Lines      134745   136692    +1947     
  Branches    31278    31519     +241     
==========================================
+ Hits        86739    87313     +574     
- Misses      46508    47863    +1355     
- Partials     1498     1516      +18     
Flag Coverage Δ
hive 39.38% <10.86%> (-0.30%) ⬇️
mysql 59.04% <10.86%> (-0.90%) ⬇️
postgres 59.12% <10.86%> (-0.90%) ⬇️
presto 41.07% <10.86%> (-0.35%) ⬇️
python 60.56% <10.86%> (-1.00%) ⬇️
sqlite 58.76% <10.86%> (-0.89%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

nh3 behavior for '<script>alert(1)</script>' varies by version:
- some versions strip entire element (empty → ValidationError)
- others strip only tag delimiters (preserving 'alert(1)')
Accept both outcomes: no ValidationError means no <script> tag stored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant