fix(mcp): ASCII chart crashes with NaN when dataset contains null values#39916
fix(mcp): ASCII chart crashes with NaN when dataset contains null values#39916aminghadersohi wants to merge 6 commits intoapache:masterfrom
Conversation
When a dataset contains NULL values they become float NaN after numeric conversion. The horizontal and vertical bar chart renderers passed NaN into int(), raising ValueError. Likewise the line chart extractor would include NaN in the value list, corrupting min/max calculations. Three-layer fix: 1. Filter NaN during value extraction in _generate_ascii_bar_chart and _extract_time_series_data so NaN rows are silently skipped. 2. Add _is_nan_value() guard before int() in _generate_horizontal_bar_chart (bar_length) and _generate_vertical_bar_chart (bar_height) as defence-in-depth; NaN bar size is treated as 0 (no bar drawn). Fixes: ValueError: cannot convert float NaN to integer
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #39916 +/- ##
==========================================
- Coverage 63.88% 63.88% -0.01%
==========================================
Files 2583 2583
Lines 136604 136608 +4
Branches 31502 31504 +2
==========================================
Hits 87276 87276
- Misses 47812 47816 +4
Partials 1516 1516
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Vertical bar chart (chosen when avg label length <= 8) truncates labels to 3 characters, so 'Alpha' never appears literally. Use longer labels (> 8 chars average) to force horizontal layout, where the full label is preserved up to 15 characters.
There was a problem hiding this comment.
Pull request overview
Fixes MCP ASCII chart preview rendering crashes when datasets contain NULLs that become NaN during numeric conversion, by filtering NaN values during data extraction and guarding the int() bar-size conversion paths. Adds unit tests to exercise bar/column/line/timeseries behavior with NaN and None inputs.
Changes:
- Filter out
NaNnumeric values during bar and time-series data extraction. - Add defense-in-depth checks to avoid
int(float('nan'))when computing bar sizes. - Add unit tests covering chart generation with
NaN/Nonevalues.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
superset/mcp_service/chart/ascii_charts.py |
Filters NaN during extraction and adds guards before converting normalized values to int() to prevent crashes. |
tests/unit_tests/mcp_service/chart/test_ascii_charts.py |
Adds unit tests for NaN/None handling across several ASCII chart types. |
| def test_horizontal_bar_chart_nan_rows_are_skipped() -> None: | ||
| """NaN rows must be silently skipped; valid rows render normally.""" | ||
| # Use labels longer than 8 chars to force horizontal layout, where full | ||
| # label text is preserved (vertical layout truncates to 3 chars). | ||
| data = [ | ||
| {"label": "Alpha Category", "amount": 50.0}, | ||
| {"label": "Beta Category", "amount": float("nan")}, | ||
| {"label": "Gamma Category", "amount": 150.0}, | ||
| ] | ||
| result = generate_ascii_chart(data, "bar") |
There was a problem hiding this comment.
Addressed. — agor claude on Amin's behalf
Assert the exact fallback message 'No numeric data found for bar chart' instead of just checking len(result) > 0, which would pass even on unrelated error strings.
- test_horizontal_bar_chart_nan_rows_are_skipped: replace weak `or` assertion with separate `assert "Alpha" in result`, `assert "Gamma" in result`, and `assert "Beta" not in result`, making the test deterministic and verifying the NaN row is actually excluded - test_bar_chart_with_all_null_values_returns_fallback: add `isinstance(result, str)` and `assert "█" not in result` to explicitly verify no bar content is rendered in the fallback path
Add `assert "Horizontal Bar Chart" in result` to make it explicit that the horizontal renderer is chosen (avg label length 14 > 8 threshold). Horizontal mode preserves full label text; vertical would truncate to 3 chars, invalidating the "Alpha"/"Gamma" presence assertions.
| if ( | ||
| isinstance(val, (int, float)) | ||
| and not _is_nan_value(val) | ||
| and numeric_val is None | ||
| ): |
There was a problem hiding this comment.
Suggestion: This numeric extraction condition now accepts booleans as numeric values (True/False are subclasses of int) and, because of the numeric_val is None gate, it can lock onto a boolean field and ignore the real metric in the same row. In datasets where a boolean column appears before the metric column, bars will be computed from 0/1 instead of the actual values. Exclude booleans from numeric detection (for example, require not isinstance(val, bool)) before assigning numeric_val. [logic error]
Severity Level: Major ⚠️
- ⚠️ MCP get_chart_preview bar charts show 0/1 instead of metrics.
- ⚠️ ASCII bar/column previews misrepresent numeric values for users.Steps of Reproduction ✅
1. Configure any Superset chart whose `viz_type` is `"bar"`, `"column"`, or
`"echarts_timeseries_bar"` (these are routed to the bar renderer in
`superset/mcp_service/chart/ascii_charts.py:50-51`) using a dataset where the first column
in the query result row is BOOLEAN (e.g. `is_active`) and the second is a numeric metric
(e.g. `total_sales`).
2. Use the MCP `get_chart_preview` tool, which executes `ChartPreviewStrategy.generate()`
in `superset/mcp_service/chart/tool/get_chart_preview.py:3-35`. That method runs a
`ChartDataCommand`, obtains `data = result["queries"][0].get("data", [])` at
`get_chart_preview.py:26-28`, and then calls `ascii_chart = generate_ascii_chart(data,
self.chart.viz_type or "table", ...)` at `get_chart_preview.py:30-35`.
3. Inside `generate_ascii_chart` in `superset/mcp_service/chart/ascii_charts.py:34-53`,
the `"bar"`/`"column"`/`"echarts_timeseries_bar"` chart types are dispatched to
`_generate_ascii_bar_chart(data, width, height)` at `ascii_charts.py:50-51`.
4. In `_generate_ascii_bar_chart` (`ascii_charts.py:66-93`), each result row dict is
iterated in insertion order (`for _key, val in row.items():` at line 81). For a row like
`{"is_active": True, "total_sales": 100.0}`, the first `val` is the boolean `True`. The
condition at lines 82-86:
`if (isinstance(val, (int, float)) and not _is_nan_value(val) and numeric_val is
None):`
evaluates to True because `True` is an instance of `int` and `_is_nan_value(True)`
(implemented at `ascii_charts.py:23-27`) returns False. This sets `numeric_val = True`
(1) at line 87, and because `numeric_val` is no longer `None`, the actual metric
`total_sales` is ignored. The `values` list appended at lines 91-93 thus contains 0/1
flags instead of real metric values, and the horizontal/vertical bar chart produced by
`_generate_horizontal_bar_chart`/`_generate_vertical_bar_chart` uses these incorrect
0/1-based values in the MCP ASCII preview.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/ascii_charts.py
**Line:** 82:86
**Comment:**
*Logic Error: This numeric extraction condition now accepts booleans as numeric values (`True`/`False` are subclasses of `int`) and, because of the `numeric_val is None` gate, it can lock onto a boolean field and ignore the real metric in the same row. In datasets where a boolean column appears before the metric column, bars will be computed from 0/1 instead of the actual values. Exclude booleans from numeric detection (for example, require `not isinstance(val, bool)`) before assigning `numeric_val`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
Addressed. — agor claude on Amin's behalf
| if ( | ||
| isinstance(val, (int, float)) | ||
| and not _is_nan_value(val) | ||
| and numeric_val is None | ||
| ): |
There was a problem hiding this comment.
Suggestion: The same extraction logic in time-series parsing can also select boolean columns as numeric points and then stop at the first match due numeric_val is None, which can replace the intended metric series with 0/1 values when boolean fields come first in row order. This breaks line/time-series charts for valid mixed-schema datasets. Exclude booleans from the numeric branch so only true numeric measures are used. [logic error]
Severity Level: Major ⚠️
- ⚠️ MCP get_chart_preview line charts use boolean 0/1 values.
- ⚠️ Time-series ASCII previews misrepresent metric trends for users.Steps of Reproduction ✅
1. Configure a Superset chart whose `viz_type` is `"line"` or `"echarts_timeseries_line"`
(routed to the line renderer in `superset/mcp_service/chart/ascii_charts.py:52-53`) using
a dataset whose query returns rows with a BOOLEAN column first (e.g. `is_active`) and a
numeric metric second (e.g. `total_sales`) along with a temporal label column (e.g.
`date`).
2. Invoke the MCP `get_chart_preview` tool so that `ChartPreviewStrategy.generate()` in
`superset/mcp_service/chart/tool/get_chart_preview.py:3-35` runs `ChartDataCommand`,
populates `data = result["queries"][0].get("data", [])` at lines 26-28, and calls
`generate_ascii_chart(data, self.chart.viz_type or "table", ...)` at lines 30-35.
3. In `generate_ascii_chart` (`superset/mcp_service/chart/ascii_charts.py:34-54`), when
`chart_type` is `"line"` or `"echarts_timeseries_line"`, control flows to
`_generate_ascii_line_chart(data, width, height)` as seen at line 52.
`_generate_ascii_line_chart` then calls `values, labels = _extract_time_series_data(data)`
at `ascii_charts.py:253`.
4. Inside `_extract_time_series_data` (`ascii_charts.py:275-307`), each row dict is
iterated (`for key, val in row.items():` at line 286). For a row such as `{"is_active":
True, "total_sales": 100.0, "date": "2024-01-01"}`, the first `val` is the boolean `True`.
The numeric-branch condition at lines 287-291:
`if (isinstance(val, (int, float)) and not _is_nan_value(val) and numeric_val is
None):`
evaluates to True because `True` is an `int` and `_is_nan_value(True)` returns False
(implementation at `ascii_charts.py:23-27` uses `math.isnan(float(value))`). This sets
`numeric_val = True` at line 292 and prevents later numeric fields from being
considered (`numeric_val is None` gate). As a result, the `values` list appended at
lines 303-305 contains 0/1 boolean flags instead of the intended metric (e.g.
`total_sales`), so the line chart drawn by `_create_enhanced_line_chart`
(`ascii_charts.py:310-378`) and returned to the MCP `get_chart_preview` caller has an
incorrect time-series shape.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/ascii_charts.py
**Line:** 287:291
**Comment:**
*Logic Error: The same extraction logic in time-series parsing can also select boolean columns as numeric points and then stop at the first match due `numeric_val is None`, which can replace the intended metric series with 0/1 values when boolean fields come first in row order. This breaks line/time-series charts for valid mixed-schema datasets. Exclude booleans from the numeric branch so only true numeric measures are used.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
Addressed. — agor claude on Amin's behalf
…derers bool is a subclass of int, so isinstance(True, (int, float)) returns True. Without an explicit bool guard the extractor would lock onto a boolean column (is_active, flag, etc.) and ignore the real numeric metric, producing 0/1 bars instead of actual values. Add not isinstance(val, bool) to the numeric guard in both _generate_ascii_bar_chart and _extract_time_series_data.
Code Review Agent Run #f8215aActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
When a dataset contains NULL values, they are represented as
float('nan')after numeric conversion. The ASCII bar chart renderers passed NaN directly intoint(), raisingValueError: cannot convert float NaN to integer.Root cause:
_generate_ascii_bar_chartand_extract_time_series_dataacceptedfloat('nan')values through theisinstance(val, (int, float))check, which returnsTruefor NaN. The NaN then propagated intomax()/ normalization calculations, causing the crash atint(normalized * max_bar_width).Three-layer fix:
_generate_ascii_bar_chartand_extract_time_series_datanow call_is_nan_value()(already defined in the module) before accepting a numeric value, silently skipping NaN rows._generate_horizontal_bar_chartand_generate_vertical_bar_chartcheck_is_nan_value(normalized)before theint()call, treating NaN bar size as 0 (no bar drawn).BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A — ASCII chart output; crash prevented.
TESTING INSTRUCTIONS
get_chart_previewtool.ValueError: cannot convert float NaN to integer.Unit tests added in
tests/unit_tests/mcp_service/chart/test_ascii_charts.pycovering bar, column, line, and timeseries bar chart types with NaN andNonevalues.ADDITIONAL INFORMATION