-
Notifications
You must be signed in to change notification settings - Fork 196
Added edge case compilation tests for Windows #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
51c88e0
Add test for MSVC compilation w/o CMake
vincentlaucsb 20eb89b
Make CXX version macro __MSVC_LANG aware
vincentlaucsb 7de5ffe
Added no exceptions warning + MinGW test
vincentlaucsb 1e3bd41
Update mio.hpp
vincentlaucsb 6c3f9ee
Potential fix for code scanning alert no. 290: Workflow does not cont…
vincentlaucsb 3c02ce6
Single header on release
vincentlaucsb aba8dc1
Merge branch 'mar-31-2026' of https://github.com/vincentlaucsb/csv-pa…
vincentlaucsb beb6ce2
More tests
vincentlaucsb b99dcdd
Fix GH Actions dep warnings
vincentlaucsb 26970fc
Update permissions; update timeout_helper()
vincentlaucsb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -93,3 +93,4 @@ jobs: | |
| run: ctest --build-config ${{ matrix.build_type }} | ||
|
|
||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| name: MSVC Consumer Compatibility | ||
|
|
||
| on: | ||
| push: | ||
| branches: [ "master" ] | ||
| paths: | ||
| - "include/**" | ||
| - "single_header.py" | ||
| - "CMakeLists.txt" | ||
| - ".github/workflows/msvc-consumer-compat.yml" | ||
| pull_request: | ||
| branches: [ "master" ] | ||
| paths: | ||
| - "include/**" | ||
| - "single_header.py" | ||
| - "CMakeLists.txt" | ||
| - ".github/workflows/msvc-consumer-compat.yml" | ||
| workflow_dispatch: | ||
|
|
||
| jobs: | ||
| windows-msvc-no-zc-cplusplus: | ||
| name: Windows MSVC compatibility without /Zc:__cplusplus | ||
| runs-on: windows-latest | ||
|
|
||
| steps: | ||
| - name: Checkout repository and submodules | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| submodules: recursive | ||
|
|
||
| - name: Set up MSVC developer command prompt | ||
| uses: ilammy/msvc-dev-cmd@v1 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.x' | ||
|
|
||
| - name: Configure CMake (for single-header generation) | ||
| run: > | ||
| cmake -S ${{ github.workspace }} | ||
| -B ${{ github.workspace }}/build/no-zc-cplusplus | ||
| -DCSV_CXX_STANDARD=20 | ||
| -DCSV_BUILD_PROGRAMS=OFF | ||
| -DBUILD_PYTHON=OFF | ||
|
|
||
| - name: Generate amalgamated single-header | ||
| run: cmake --build ${{ github.workspace }}/build/no-zc-cplusplus --config Release --target generate_single_header | ||
|
|
||
| - name: Compile generated single-header without /Zc:__cplusplus | ||
| shell: cmd | ||
| run: | | ||
| REM NOTE: single_include/csv.hpp is an intentional compatibility shim; compile the generated amalgamated header instead. | ||
| echo #include "csv.hpp" > ci_no_zc_single_header.cpp | ||
| echo int main() { return 0; } >> ci_no_zc_single_header.cpp | ||
| cl /nologo /std:c++20 /EHsc /Zc:__cplusplus- /I build\no-zc-cplusplus\single_include_generated /c ci_no_zc_single_header.cpp | ||
|
|
||
| - name: Compile unamalgamated header without /Zc:__cplusplus | ||
| shell: cmd | ||
| run: | | ||
| echo #include "csv.hpp" > ci_no_zc_unamalgamated.cpp | ||
| echo int main() { return 0; } >> ci_no_zc_unamalgamated.cpp | ||
| cl /nologo /std:c++20 /EHsc /Zc:__cplusplus- /I include /c ci_no_zc_unamalgamated.cpp | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| # CSV Parser - AI Agent Context | ||
|
|
||
| Architectural overview for AI assistants working with this codebase. | ||
|
|
||
| > **Maintenance rule:** Whenever this file is changed, `CLAUDE.md` in the same directory must be updated to reflect the changes. `CLAUDE.md` is a bullet-point summary of this file and must stay in sync. | ||
|
|
||
| ## Critical: single_include/csv.hpp Is A Shim | ||
|
|
||
| `single_include/csv.hpp` is intentionally **non-functional** and exists only as a compatibility shim. | ||
|
|
||
| - Do **not** compile against `single_include/csv.hpp` | ||
| - For single-header validation, generate `build/.../single_include_generated/csv.hpp` via the `generate_single_header` target, then compile that generated file | ||
| - For unamalgamated usage, include headers from `include/` | ||
|
|
||
| This guard exists to prevent stale-in-repo amalgamated headers and to force use of the canonical generated distribution. | ||
|
|
||
| ## Critical: Two Independent Code Paths | ||
|
|
||
| The `CSVReader` class has **two completely different implementations**: | ||
|
|
||
| ```cpp | ||
| // PATH 1: Memory-mapped I/O (MmapParser) | ||
| CSVReader reader("filename.csv"); | ||
|
|
||
| // PATH 2: Stream-based (StreamParser) | ||
| std::ifstream infile("filename.csv", std::ios::binary); | ||
| CSVReader reader(infile, format); | ||
| ``` | ||
|
|
||
| **Impact:** Bugs can exist in one path but not the other (see issue #281). Any test validating parsing behavior must test BOTH paths using Catch2 `SECTION`. | ||
|
|
||
| ## Threading: Worker + 10MB Chunks | ||
|
|
||
| - Worker thread reads in 10MB chunks (`ITERATION_CHUNK_SIZE`) | ||
| - Communicates via `ThreadSafeDeque<CSVRow>` | ||
| - Exceptions propagate via `std::exception_ptr` | ||
| - Critical: Fields spanning chunk boundaries must not corrupt | ||
|
|
||
| **Testing requirement:** Use ≥500K rows to cross 10MB boundary. | ||
|
|
||
| ## Key Files | ||
|
|
||
| | File | Contains | | ||
| |------|----------| | ||
| | `csv_reader.hpp` | Mmap vs stream constructors | | ||
| | `csv_reader.cpp` | Delimiter guessing, header detection | | ||
| | `basic_csv_parser.hpp` | Parser base class (IBasicCSVParser, MmapParser, StreamParser) | | ||
| | `basic_csv_parser.cpp` | Chunk transitions, worker thread | | ||
| | `raw_csv_data.hpp` | Internal parser data structures (RawCSVField, CSVFieldList, RawCSVData) | | ||
| | `thread_safe_deque.hpp` | Producer-consumer queue for parser→main thread communication | | ||
| | `csv_row.hpp` | Public API types (CSVField, CSVRow) | | ||
| | `test_round_trip.cpp` | Exemplar test patterns | | ||
|
|
||
| ## Data Flow: Parser → Row API | ||
|
|
||
| ``` | ||
| Parser Thread Main Thread | ||
| ↓ ↓ | ||
| RawCSVData (shared_ptr) ─────────────→ CSVRow | ||
| ↓ ↓ | ||
| CSVFieldList → RawCSVField[] CSVField (lazy unescaping) | ||
| ↓ | ||
| ThreadSafeDeque<CSVRow> | ||
| (producer-consumer queue) | ||
| ``` | ||
|
|
||
| **Thread Safety:** Parser populates `RawCSVData`, pushes `CSVRow` to `ThreadSafeDeque`, main thread pops and reads. The `CSVFieldList` uses chunked allocation (~170 fields/chunk) for cache locality. See `raw_csv_data.hpp` and `thread_safe_deque.hpp` for implementation details. | ||
|
|
||
| ## Common Pitfalls | ||
|
|
||
| 1. **Don't assume one code path:** Mmap and stream paths are different. Always test both. | ||
| 2. **Don't write tiny tests:** Need ≥500K rows to cross 10MB chunk boundary. | ||
| 3. **Don't use uniform values:** Each column needs distinct values to detect corruption. | ||
| 4. **Don't ignore async:** Worker thread means exceptions must use `exception_ptr`. | ||
| 5. **Don't change one constructor:** Likely affects both mmap and stream paths. | ||
|
|
||
| See `tests/AGENTS.md` for test strategy, checklist, and conventions. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,94 +1,37 @@ | ||
| # CSV Parser - AI Agent Context | ||
| # CSV Parser - Claude Summary | ||
|
|
||
| Architectural overview for AI assistants working with this codebase. | ||
| > **`AGENTS.md` is the source of truth.** This file is a bullet-point summary only. Always load and follow `AGENTS.md` — it takes precedence over anything here. | ||
|
|
||
| ## Critical: Two Independent Code Paths | ||
| ## single_include/csv.hpp | ||
| - Non-functional shim — do **not** compile against it | ||
| - For single-header use: generate `build/.../single_include_generated/csv.hpp` via `generate_single_header` target | ||
| - For unamalgamated use: include from `include/` | ||
|
|
||
| The `CSVReader` class has **two completely different implementations**: | ||
| ## Two Independent Code Paths | ||
| - `CSVReader("file.csv")` → MmapParser | ||
| - `CSVReader(istream, format)` → StreamParser | ||
| - Bugs can exist in one and not the other — always test both with Catch2 `SECTION` | ||
|
|
||
| ```cpp | ||
| // PATH 1: Memory-mapped I/O (MmapParser) | ||
| CSVReader reader("filename.csv"); | ||
|
|
||
| // PATH 2: Stream-based (StreamParser) | ||
| std::ifstream infile("filename.csv", std::ios::binary); | ||
| CSVReader reader(infile, format); | ||
| ``` | ||
|
|
||
| **Impact:** Bugs can exist in one path but not the other (see issue #281). Any test validating parsing behavior must test BOTH paths using Catch2 `SECTION`. | ||
|
|
||
| ## Threading: Worker + 10MB Chunks | ||
|
|
||
| - Worker thread reads in 10MB chunks (`ITERATION_CHUNK_SIZE`) | ||
| - Communicates via `ThreadSafeDeque<CSVRow>` | ||
| ## Threading | ||
| - Worker thread reads 10MB chunks (`ITERATION_CHUNK_SIZE`) | ||
| - Communication via `ThreadSafeDeque<CSVRow>` | ||
| - Exceptions propagate via `std::exception_ptr` | ||
| - Critical: Fields spanning chunk boundaries must not corrupt | ||
|
|
||
| **Testing requirement:** Use ≥500K rows to cross 10MB boundary. | ||
|
|
||
| ## Test Strategy: Use Distinct Column Values | ||
|
|
||
| ❌ **BAD:** `array{i, i, i, i, i}` - All columns identical | ||
| ✅ **GOOD:** `array{i*5+0, i*5+1, i*5+2, i*5+3, i*5+4}` - Each column distinct | ||
|
|
||
| **Why:** Field corruption is only detectable if columns have different values. | ||
| - Tests must use ≥500K rows to cross chunk boundary | ||
|
|
||
| ## Key Files | ||
|
|
||
| | File | Contains | | ||
| |------|----------| | ||
| | `csv_reader.hpp` | Mmap vs stream constructors | | ||
| | `csv_reader.cpp` | Delimiter guessing, header detection | | ||
| | `basic_csv_parser.hpp` | Parser base class (IBasicCSVParser, MmapParser, StreamParser) | | ||
| | `basic_csv_parser.cpp` | Chunk transitions, worker thread | | ||
| | `raw_csv_data.hpp` | Internal parser data structures (RawCSVField, CSVFieldList, RawCSVData) | | ||
| | `thread_safe_deque.hpp` | Producer-consumer queue for parser→main thread communication | | ||
| | `csv_row.hpp` | Public API types (CSVField, CSVRow) | | ||
| | `test_round_trip.cpp` | Exemplar test patterns | | ||
|
|
||
| ## Data Flow: Parser → Row API | ||
|
|
||
| ``` | ||
| Parser Thread Main Thread | ||
| ↓ ↓ | ||
| RawCSVData (shared_ptr) ─────────────→ CSVRow | ||
| ↓ ↓ | ||
| CSVFieldList → RawCSVField[] CSVField (lazy unescaping) | ||
| ↓ | ||
| ThreadSafeDeque<CSVRow> | ||
| (producer-consumer queue) | ||
| ``` | ||
|
|
||
| **Thread Safety:** Parser populates `RawCSVData`, pushes `CSVRow` to `ThreadSafeDeque`, main thread pops and reads. The `CSVFieldList` uses chunked allocation (~170 fields/chunk) for cache locality. See `raw_csv_data.hpp` and `thread_safe_deque.hpp` for implementation details. | ||
| - `csv_reader.hpp` — mmap vs stream constructors | ||
| - `basic_csv_parser.hpp` — MmapParser, StreamParser implementations | ||
| - `basic_csv_parser.cpp` — chunk transitions, worker thread | ||
| - `raw_csv_data.hpp` — RawCSVField, CSVFieldList, RawCSVData | ||
| - `thread_safe_deque.hpp` — producer-consumer queue | ||
| - `csv_row.hpp` — CSVField, CSVRow public API | ||
|
|
||
| ## Common Pitfalls | ||
|
|
||
| 1. **Don't assume one code path:** Mmap and stream paths are different. Always test both. | ||
| 2. **Don't write tiny tests:** Need ≥500K rows to cross 10MB chunk boundary. | ||
| 3. **Don't use uniform values:** Each column needs distinct values to detect corruption. | ||
| 4. **Don't ignore async:** Worker thread means exceptions must use `exception_ptr`. | ||
| 5. **Don't change one constructor:** Likely affects both mmap and stream paths. | ||
|
|
||
| ## Test Checklist | ||
|
|
||
| - [ ] Tests both mmap and stream paths (use `SECTION`) | ||
| - [ ] Distinct values per column | ||
| - [ ] ≥500K rows to cross chunk boundary | ||
| - [ ] Documents bug it would catch | ||
| - [ ] Lambda + SECTION pattern for code reuse | ||
| - [ ] Test data in `tests/data/fake_data` (real data in `tests/data/real_data`) | ||
| - [ ] Use `FileGuard` for temporary files (ensures cleanup even if test fails) | ||
|
|
||
| **Note:** `tests/data` is a git submodule. Remember to commit changes separately. | ||
|
|
||
| ## Recent Bug Fixes | ||
|
|
||
| | Issue | Bug | Fixed | | ||
| |-------|-----|-------| | ||
| | #278 | CSVFieldList move constructor dangling pointer | Feb 2026 | | ||
| | #280 | Field corruption at chunk boundaries | PR #282 | | ||
| | #281 | Stream-specific exception handling | PR #282 | | ||
| | #283 | Header detection with variable-width rows | Jan 2026 | | ||
| | #285 | Delimiter guessing overwrites `no_header()` | Feb 2026 | | ||
|
|
||
| See inline comments in source files for implementation details. | ||
| - Always test both mmap and stream paths | ||
| - ≥500K rows needed to cross 10MB boundary | ||
| - Use distinct column values to detect field corruption | ||
| - Exceptions from worker thread need `exception_ptr` | ||
| - Changes to one constructor likely affect both paths | ||
|
|
||
| ## Tests | ||
| See `tests/AGENTS.md` for full test strategy, checklist, and conventions. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.