Skip to content

fix(mcp): ASCII chart crashes with NaN when dataset contains null values#39916

Open
aminghadersohi wants to merge 6 commits intoapache:masterfrom
aminghadersohi:fix/mcp-ascii-chart-nan
Open

fix(mcp): ASCII chart crashes with NaN when dataset contains null values#39916
aminghadersohi wants to merge 6 commits intoapache:masterfrom
aminghadersohi:fix/mcp-ascii-chart-nan

Conversation

@aminghadersohi
Copy link
Copy Markdown
Contributor

SUMMARY

When a dataset contains NULL values, they are represented as float('nan') after numeric conversion. The ASCII bar chart renderers passed NaN directly into int(), raising ValueError: cannot convert float NaN to integer.

Root cause: _generate_ascii_bar_chart and _extract_time_series_data accepted float('nan') values through the isinstance(val, (int, float)) check, which returns True for NaN. The NaN then propagated into max() / normalization calculations, causing the crash at int(normalized * max_bar_width).

Three-layer fix:

  1. Extraction-time filtering_generate_ascii_bar_chart and _extract_time_series_data now call _is_nan_value() (already defined in the module) before accepting a numeric value, silently skipping NaN rows.
  2. Defence-in-depth guards_generate_horizontal_bar_chart and _generate_vertical_bar_chart check _is_nan_value(normalized) before the int() call, treating NaN bar size as 0 (no bar drawn).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — ASCII chart output; crash prevented.

TESTING INSTRUCTIONS

  1. Create a dataset with at least one NULL value in a numeric column.
  2. Render a bar or line chart via the MCP get_chart_preview tool.
  3. Before this fix: ValueError: cannot convert float NaN to integer.
  4. After this fix: chart renders with NaN rows silently skipped.

Unit tests added in tests/unit_tests/mcp_service/chart/test_ascii_charts.py covering bar, column, line, and timeseries bar chart types with NaN and None values.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
  • Introduces new feature or API
  • Removes existing feature or API

When a dataset contains NULL values they become float NaN after numeric
conversion.  The horizontal and vertical bar chart renderers passed NaN
into int(), raising ValueError.  Likewise the line chart extractor would
include NaN in the value list, corrupting min/max calculations.

Three-layer fix:
1. Filter NaN during value extraction in _generate_ascii_bar_chart and
   _extract_time_series_data so NaN rows are silently skipped.
2. Add _is_nan_value() guard before int() in _generate_horizontal_bar_chart
   (bar_length) and _generate_vertical_bar_chart (bar_height) as
   defence-in-depth; NaN bar size is treated as 0 (no bar drawn).

Fixes: ValueError: cannot convert float NaN to integer
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.88%. Comparing base (5b5dd01) to head (f80e3b3).

Files with missing lines Patch % Lines
superset/mcp_service/chart/ascii_charts.py 0.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39916      +/-   ##
==========================================
- Coverage   63.88%   63.88%   -0.01%     
==========================================
  Files        2583     2583              
  Lines      136604   136608       +4     
  Branches    31502    31504       +2     
==========================================
  Hits        87276    87276              
- Misses      47812    47816       +4     
  Partials     1516     1516              
Flag Coverage Δ
hive 39.38% <0.00%> (-0.01%) ⬇️
mysql 59.05% <0.00%> (-0.01%) ⬇️
postgres 59.13% <0.00%> (-0.01%) ⬇️
presto 41.08% <0.00%> (-0.01%) ⬇️
python 60.57% <0.00%> (-0.01%) ⬇️
sqlite 58.77% <0.00%> (-0.01%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Vertical bar chart (chosen when avg label length <= 8) truncates labels
to 3 characters, so 'Alpha' never appears literally. Use longer labels
(> 8 chars average) to force horizontal layout, where the full label is
preserved up to 15 characters.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes MCP ASCII chart preview rendering crashes when datasets contain NULLs that become NaN during numeric conversion, by filtering NaN values during data extraction and guarding the int() bar-size conversion paths. Adds unit tests to exercise bar/column/line/timeseries behavior with NaN and None inputs.

Changes:

  • Filter out NaN numeric values during bar and time-series data extraction.
  • Add defense-in-depth checks to avoid int(float('nan')) when computing bar sizes.
  • Add unit tests covering chart generation with NaN/None values.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
superset/mcp_service/chart/ascii_charts.py Filters NaN during extraction and adds guards before converting normalized values to int() to prevent crashes.
tests/unit_tests/mcp_service/chart/test_ascii_charts.py Adds unit tests for NaN/None handling across several ASCII chart types.

Comment on lines +58 to +67
def test_horizontal_bar_chart_nan_rows_are_skipped() -> None:
"""NaN rows must be silently skipped; valid rows render normally."""
# Use labels longer than 8 chars to force horizontal layout, where full
# label text is preserved (vertical layout truncates to 3 chars).
data = [
{"label": "Alpha Category", "amount": 50.0},
{"label": "Beta Category", "amount": float("nan")},
{"label": "Gamma Category", "amount": 150.0},
]
result = generate_ascii_chart(data, "bar")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. — agor claude on Amin's behalf

Comment thread tests/unit_tests/mcp_service/chart/test_ascii_charts.py Outdated
Assert the exact fallback message 'No numeric data found for bar chart'
instead of just checking len(result) > 0, which would pass even on
unrelated error strings.
@aminghadersohi aminghadersohi marked this pull request as ready for review May 7, 2026 02:37
@dosubot dosubot Bot added the viz:charts:bar Related to the Bar chart label May 7, 2026
- test_horizontal_bar_chart_nan_rows_are_skipped: replace weak `or`
  assertion with separate `assert "Alpha" in result`, `assert "Gamma"
  in result`, and `assert "Beta" not in result`, making the test
  deterministic and verifying the NaN row is actually excluded

- test_bar_chart_with_all_null_values_returns_fallback: add
  `isinstance(result, str)` and `assert "█" not in result` to
  explicitly verify no bar content is rendered in the fallback path
Add `assert "Horizontal Bar Chart" in result` to make it explicit that
the horizontal renderer is chosen (avg label length 14 > 8 threshold).
Horizontal mode preserves full label text; vertical would truncate to
3 chars, invalidating the "Alpha"/"Gamma" presence assertions.
Comment on lines +82 to +86
if (
isinstance(val, (int, float))
and not _is_nan_value(val)
and numeric_val is None
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This numeric extraction condition now accepts booleans as numeric values (True/False are subclasses of int) and, because of the numeric_val is None gate, it can lock onto a boolean field and ignore the real metric in the same row. In datasets where a boolean column appears before the metric column, bars will be computed from 0/1 instead of the actual values. Exclude booleans from numeric detection (for example, require not isinstance(val, bool)) before assigning numeric_val. [logic error]

Severity Level: Major ⚠️
- ⚠️ MCP get_chart_preview bar charts show 0/1 instead of metrics.
- ⚠️ ASCII bar/column previews misrepresent numeric values for users.
Steps of Reproduction ✅
1. Configure any Superset chart whose `viz_type` is `"bar"`, `"column"`, or
`"echarts_timeseries_bar"` (these are routed to the bar renderer in
`superset/mcp_service/chart/ascii_charts.py:50-51`) using a dataset where the first column
in the query result row is BOOLEAN (e.g. `is_active`) and the second is a numeric metric
(e.g. `total_sales`).

2. Use the MCP `get_chart_preview` tool, which executes `ChartPreviewStrategy.generate()`
in `superset/mcp_service/chart/tool/get_chart_preview.py:3-35`. That method runs a
`ChartDataCommand`, obtains `data = result["queries"][0].get("data", [])` at
`get_chart_preview.py:26-28`, and then calls `ascii_chart = generate_ascii_chart(data,
self.chart.viz_type or "table", ...)` at `get_chart_preview.py:30-35`.

3. Inside `generate_ascii_chart` in `superset/mcp_service/chart/ascii_charts.py:34-53`,
the `"bar"`/`"column"`/`"echarts_timeseries_bar"` chart types are dispatched to
`_generate_ascii_bar_chart(data, width, height)` at `ascii_charts.py:50-51`.

4. In `_generate_ascii_bar_chart` (`ascii_charts.py:66-93`), each result row dict is
iterated in insertion order (`for _key, val in row.items():` at line 81). For a row like
`{"is_active": True, "total_sales": 100.0}`, the first `val` is the boolean `True`. The
condition at lines 82-86:

   `if (isinstance(val, (int, float)) and not _is_nan_value(val) and numeric_val is
   None):`

   evaluates to True because `True` is an instance of `int` and `_is_nan_value(True)`
   (implemented at `ascii_charts.py:23-27`) returns False. This sets `numeric_val = True`
   (1) at line 87, and because `numeric_val` is no longer `None`, the actual metric
   `total_sales` is ignored. The `values` list appended at lines 91-93 thus contains 0/1
   flags instead of real metric values, and the horizontal/vertical bar chart produced by
   `_generate_horizontal_bar_chart`/`_generate_vertical_bar_chart` uses these incorrect
   0/1-based values in the MCP ASCII preview.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/mcp_service/chart/ascii_charts.py
**Line:** 82:86
**Comment:**
	*Logic Error: This numeric extraction condition now accepts booleans as numeric values (`True`/`False` are subclasses of `int`) and, because of the `numeric_val is None` gate, it can lock onto a boolean field and ignore the real metric in the same row. In datasets where a boolean column appears before the metric column, bars will be computed from 0/1 instead of the actual values. Exclude booleans from numeric detection (for example, require `not isinstance(val, bool)`) before assigning `numeric_val`.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. — agor claude on Amin's behalf

Comment on lines +287 to +291
if (
isinstance(val, (int, float))
and not _is_nan_value(val)
and numeric_val is None
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The same extraction logic in time-series parsing can also select boolean columns as numeric points and then stop at the first match due numeric_val is None, which can replace the intended metric series with 0/1 values when boolean fields come first in row order. This breaks line/time-series charts for valid mixed-schema datasets. Exclude booleans from the numeric branch so only true numeric measures are used. [logic error]

Severity Level: Major ⚠️
- ⚠️ MCP get_chart_preview line charts use boolean 0/1 values.
- ⚠️ Time-series ASCII previews misrepresent metric trends for users.
Steps of Reproduction ✅
1. Configure a Superset chart whose `viz_type` is `"line"` or `"echarts_timeseries_line"`
(routed to the line renderer in `superset/mcp_service/chart/ascii_charts.py:52-53`) using
a dataset whose query returns rows with a BOOLEAN column first (e.g. `is_active`) and a
numeric metric second (e.g. `total_sales`) along with a temporal label column (e.g.
`date`).

2. Invoke the MCP `get_chart_preview` tool so that `ChartPreviewStrategy.generate()` in
`superset/mcp_service/chart/tool/get_chart_preview.py:3-35` runs `ChartDataCommand`,
populates `data = result["queries"][0].get("data", [])` at lines 26-28, and calls
`generate_ascii_chart(data, self.chart.viz_type or "table", ...)` at lines 30-35.

3. In `generate_ascii_chart` (`superset/mcp_service/chart/ascii_charts.py:34-54`), when
`chart_type` is `"line"` or `"echarts_timeseries_line"`, control flows to
`_generate_ascii_line_chart(data, width, height)` as seen at line 52.
`_generate_ascii_line_chart` then calls `values, labels = _extract_time_series_data(data)`
at `ascii_charts.py:253`.

4. Inside `_extract_time_series_data` (`ascii_charts.py:275-307`), each row dict is
iterated (`for key, val in row.items():` at line 286). For a row such as `{"is_active":
True, "total_sales": 100.0, "date": "2024-01-01"}`, the first `val` is the boolean `True`.
The numeric-branch condition at lines 287-291:

   `if (isinstance(val, (int, float)) and not _is_nan_value(val) and numeric_val is
   None):`

   evaluates to True because `True` is an `int` and `_is_nan_value(True)` returns False
   (implementation at `ascii_charts.py:23-27` uses `math.isnan(float(value))`). This sets
   `numeric_val = True` at line 292 and prevents later numeric fields from being
   considered (`numeric_val is None` gate). As a result, the `values` list appended at
   lines 303-305 contains 0/1 boolean flags instead of the intended metric (e.g.
   `total_sales`), so the line chart drawn by `_create_enhanced_line_chart`
   (`ascii_charts.py:310-378`) and returned to the MCP `get_chart_preview` caller has an
   incorrect time-series shape.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/mcp_service/chart/ascii_charts.py
**Line:** 287:291
**Comment:**
	*Logic Error: The same extraction logic in time-series parsing can also select boolean columns as numeric points and then stop at the first match due `numeric_val is None`, which can replace the intended metric series with 0/1 values when boolean fields come first in row order. This breaks line/time-series charts for valid mixed-schema datasets. Exclude booleans from the numeric branch so only true numeric measures are used.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. — agor claude on Amin's behalf

…derers

bool is a subclass of int, so isinstance(True, (int, float)) returns True.
Without an explicit bool guard the extractor would lock onto a boolean
column (is_active, flag, etc.) and ignore the real numeric metric,
producing 0/1 bars instead of actual values.

Add not isinstance(val, bool) to the numeric guard in both
_generate_ascii_bar_chart and _extract_time_series_data.
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 7, 2026

Code Review Agent Run #f8215a

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 393906c..f80e3b3
    • superset/mcp_service/chart/ascii_charts.py
    • tests/unit_tests/mcp_service/chart/test_ascii_charts.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L viz:charts:bar Related to the Bar chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants