-
Notifications
You must be signed in to change notification settings - Fork 196
Added edge case compilation tests for Windows #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
51c88e0
20eb89b
7de5ffe
1e3bd41
6c3f9ee
3c02ce6
aba8dc1
beb6ce2
b99dcdd
26970fc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -93,3 +93,4 @@ jobs: | |
| run: ctest --build-config ${{ matrix.build_type }} | ||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,103 @@ | ||
| name: Compatibility Edge Cases | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| on: | ||
| push: | ||
| branches: [ "master" ] | ||
| paths: | ||
| - "include/**" | ||
| - "single_header.py" | ||
| - "CMakeLists.txt" | ||
| - ".github/workflows/compat-edge-cases.yml" | ||
| pull_request: | ||
| branches: [ "master" ] | ||
| paths: | ||
| - "include/**" | ||
| - "single_header.py" | ||
| - "CMakeLists.txt" | ||
| - ".github/workflows/compat-edge-cases.yml" | ||
| workflow_dispatch: | ||
|
|
||
| jobs: | ||
| msvc-without-zc-cplusplus: | ||
| name: MSVC without /Zc:__cplusplus | ||
| runs-on: windows-latest | ||
|
|
||
| steps: | ||
| - name: Checkout repository and submodules | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| submodules: recursive | ||
|
|
||
| - name: Set up MSVC developer command prompt | ||
| uses: ilammy/msvc-dev-cmd@v1 | ||
Check warningCode scanning / CodeQL Unpinned tag for a non-immutable Action in workflow Medium
Unpinned 3rd party Action 'Compatibility Edge Cases' step
Uses Step Error loading related location Loading |
||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.x' | ||
|
|
||
| - name: Configure CMake (for single-header generation) | ||
| run: > | ||
| cmake -S ${{ github.workspace }} | ||
| -B ${{ github.workspace }}/build/no-zc-cplusplus | ||
| -DCSV_CXX_STANDARD=20 | ||
| -DCSV_BUILD_PROGRAMS=OFF | ||
| -DBUILD_PYTHON=OFF | ||
|
|
||
| - name: Generate amalgamated single-header | ||
| run: cmake --build ${{ github.workspace }}/build/no-zc-cplusplus --config Release --target generate_single_header | ||
|
|
||
| - name: Compile generated single-header without /Zc:__cplusplus | ||
| shell: cmd | ||
| run: | | ||
| REM NOTE: single_include/csv.hpp is an intentional compatibility shim; compile the generated amalgamated header instead. | ||
| echo #include "csv.hpp" > ci_no_zc_single_header.cpp | ||
| echo int main() { return 0; } >> ci_no_zc_single_header.cpp | ||
| cl /nologo /std:c++20 /EHsc /Zc:__cplusplus- /I build\no-zc-cplusplus\single_include_generated /c ci_no_zc_single_header.cpp | ||
|
|
||
| - name: Compile unamalgamated header without /Zc:__cplusplus | ||
| shell: cmd | ||
| run: | | ||
| echo #include "csv.hpp" > ci_no_zc_unamalgamated.cpp | ||
| echo int main() { return 0; } >> ci_no_zc_unamalgamated.cpp | ||
| cl /nologo /std:c++20 /EHsc /Zc:__cplusplus- /I include /c ci_no_zc_unamalgamated.cpp | ||
|
|
||
| mingw-minimal: | ||
|
|
||
| name: MinGW-w64 minimal build and test (C++17) | ||
| runs-on: windows-latest | ||
|
|
||
| steps: | ||
| - name: Checkout repository and submodules | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| submodules: recursive | ||
|
|
||
| - name: Set up MSYS2 (MinGW64) | ||
| uses: msys2/setup-msys2@v2 | ||
Check warningCode scanning / CodeQL Unpinned tag for a non-immutable Action in workflow Medium
Unpinned 3rd party Action 'Compatibility Edge Cases' step
Uses Step Error loading related location Loading |
||
| with: | ||
| msystem: MINGW64 | ||
| update: true | ||
| install: >- | ||
| mingw-w64-x86_64-gcc | ||
| mingw-w64-x86_64-cmake | ||
| mingw-w64-x86_64-ninja | ||
|
|
||
| - name: Configure CMake (MinGW, C++17) | ||
| shell: msys2 {0} | ||
| run: > | ||
| cmake -S . -B build/mingw-minimal -G Ninja | ||
| -DCMAKE_BUILD_TYPE=Release | ||
| -DCSV_CXX_STANDARD=17 | ||
| -DCSV_BUILD_PROGRAMS=OFF | ||
| -DBUILD_PYTHON=OFF | ||
|
|
||
| - name: Build | ||
| shell: msys2 {0} | ||
| run: cmake --build build/mingw-minimal | ||
|
|
||
| - name: Test | ||
| shell: msys2 {0} | ||
| run: ctest --test-dir build/mingw-minimal --output-on-failure | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| name: Release Single Header Asset | ||
|
|
||
| on: | ||
| push: | ||
| tags: | ||
| - "v*" | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: write | ||
|
|
||
| jobs: | ||
| publish-single-header: | ||
| name: Generate and upload csv.hpp release asset | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: Checkout repository and submodules | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| submodules: recursive | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.x" | ||
|
|
||
| - name: Generate single header | ||
| run: | | ||
| mkdir -p release-assets | ||
| python single_header.py release-assets/csv.hpp | ||
|
|
||
| - name: Generate SHA256 checksum | ||
| run: | | ||
| cd release-assets | ||
| sha256sum csv.hpp > csv.hpp.sha256 | ||
|
|
||
| - name: Upload release assets | ||
| uses: softprops/action-gh-release@v2 | ||
Check warningCode scanning / CodeQL Unpinned tag for a non-immutable Action in workflow Medium
Unpinned 3rd party Action 'Release Single Header Asset' step
Uses Step Error loading related location Loading |
||
| with: | ||
| files: | | ||
| release-assets/csv.hpp | ||
| release-assets/csv.hpp.sha256 | ||
| fail_on_unmatched_files: true | ||
| generate_release_notes: true | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| # CSV Parser - AI Agent Context | ||
|
|
||
| Architectural overview for AI assistants working with this codebase. | ||
|
|
||
| > **Maintenance rule:** Whenever this file is changed, `CLAUDE.md` in the same directory must be updated to reflect the changes. `CLAUDE.md` is a bullet-point summary of this file and must stay in sync. | ||
|
|
||
| ## Critical: single_include/csv.hpp Is A Shim | ||
|
|
||
| `single_include/csv.hpp` is intentionally **non-functional** and exists only as a compatibility shim. | ||
|
|
||
| - Do **not** compile against `single_include/csv.hpp` | ||
| - For single-header validation, generate `build/.../single_include_generated/csv.hpp` via the `generate_single_header` target, then compile that generated file | ||
| - For unamalgamated usage, include headers from `include/` | ||
|
|
||
| This guard exists to prevent stale-in-repo amalgamated headers and to force use of the canonical generated distribution. | ||
|
|
||
| ## Critical: Two Independent Code Paths | ||
|
|
||
| The `CSVReader` class has **two completely different implementations**: | ||
|
|
||
| ```cpp | ||
| // PATH 1: Memory-mapped I/O (MmapParser) | ||
| CSVReader reader("filename.csv"); | ||
|
|
||
| // PATH 2: Stream-based (StreamParser) | ||
| std::ifstream infile("filename.csv", std::ios::binary); | ||
| CSVReader reader(infile, format); | ||
| ``` | ||
|
|
||
| **Impact:** Bugs can exist in one path but not the other (see issue #281). Any test validating parsing behavior must test BOTH paths using Catch2 `SECTION`. | ||
|
|
||
| ## Threading: Worker + 10MB Chunks | ||
|
|
||
| - Worker thread reads in 10MB chunks (`ITERATION_CHUNK_SIZE`) | ||
| - Communicates via `ThreadSafeDeque<CSVRow>` | ||
| - Exceptions propagate via `std::exception_ptr` | ||
| - Critical: Fields spanning chunk boundaries must not corrupt | ||
|
|
||
| **Testing requirement:** Use ≥500K rows to cross 10MB boundary. | ||
|
|
||
| ## Key Files | ||
|
|
||
| | File | Contains | | ||
| |------|----------| | ||
| | `csv_reader.hpp` | Mmap vs stream constructors | | ||
| | `csv_reader.cpp` | Delimiter guessing, header detection | | ||
| | `basic_csv_parser.hpp` | Parser base class (IBasicCSVParser, MmapParser, StreamParser) | | ||
| | `basic_csv_parser.cpp` | Chunk transitions, worker thread | | ||
| | `raw_csv_data.hpp` | Internal parser data structures (RawCSVField, CSVFieldList, RawCSVData) | | ||
| | `thread_safe_deque.hpp` | Producer-consumer queue for parser→main thread communication | | ||
| | `csv_row.hpp` | Public API types (CSVField, CSVRow) | | ||
| | `test_round_trip.cpp` | Exemplar test patterns | | ||
|
|
||
| ## Data Flow: Parser → Row API | ||
|
|
||
| ``` | ||
| Parser Thread Main Thread | ||
| ↓ ↓ | ||
| RawCSVData (shared_ptr) ─────────────→ CSVRow | ||
| ↓ ↓ | ||
| CSVFieldList → RawCSVField[] CSVField (lazy unescaping) | ||
| ↓ | ||
| ThreadSafeDeque<CSVRow> | ||
| (producer-consumer queue) | ||
| ``` | ||
|
|
||
| **Thread Safety:** Parser populates `RawCSVData`, pushes `CSVRow` to `ThreadSafeDeque`, main thread pops and reads. The `CSVFieldList` uses chunked allocation (~170 fields/chunk) for cache locality. See `raw_csv_data.hpp` and `thread_safe_deque.hpp` for implementation details. | ||
|
|
||
| ## Common Pitfalls | ||
|
|
||
| 1. **Don't assume one code path:** Mmap and stream paths are different. Always test both. | ||
| 2. **Don't write tiny tests:** Need ≥500K rows to cross 10MB chunk boundary. | ||
| 3. **Don't use uniform values:** Each column needs distinct values to detect corruption. | ||
| 4. **Don't ignore async:** Worker thread means exceptions must use `exception_ptr`. | ||
| 5. **Don't change one constructor:** Likely affects both mmap and stream paths. | ||
|
|
||
| See `tests/AGENTS.md` for test strategy, checklist, and conventions. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,94 +1,37 @@ | ||
| # CSV Parser - AI Agent Context | ||
| # CSV Parser - Claude Summary | ||
|
|
||
| Architectural overview for AI assistants working with this codebase. | ||
| > **`AGENTS.md` is the source of truth.** This file is a bullet-point summary only. Always load and follow `AGENTS.md` — it takes precedence over anything here. | ||
|
|
||
| ## Critical: Two Independent Code Paths | ||
| ## single_include/csv.hpp | ||
| - Non-functional shim — do **not** compile against it | ||
| - For single-header use: generate `build/.../single_include_generated/csv.hpp` via `generate_single_header` target | ||
| - For unamalgamated use: include from `include/` | ||
|
|
||
| The `CSVReader` class has **two completely different implementations**: | ||
| ## Two Independent Code Paths | ||
| - `CSVReader("file.csv")` → MmapParser | ||
| - `CSVReader(istream, format)` → StreamParser | ||
| - Bugs can exist in one and not the other — always test both with Catch2 `SECTION` | ||
|
|
||
| ```cpp | ||
| // PATH 1: Memory-mapped I/O (MmapParser) | ||
| CSVReader reader("filename.csv"); | ||
|
|
||
| // PATH 2: Stream-based (StreamParser) | ||
| std::ifstream infile("filename.csv", std::ios::binary); | ||
| CSVReader reader(infile, format); | ||
| ``` | ||
|
|
||
| **Impact:** Bugs can exist in one path but not the other (see issue #281). Any test validating parsing behavior must test BOTH paths using Catch2 `SECTION`. | ||
|
|
||
| ## Threading: Worker + 10MB Chunks | ||
|
|
||
| - Worker thread reads in 10MB chunks (`ITERATION_CHUNK_SIZE`) | ||
| - Communicates via `ThreadSafeDeque<CSVRow>` | ||
| ## Threading | ||
| - Worker thread reads 10MB chunks (`ITERATION_CHUNK_SIZE`) | ||
| - Communication via `ThreadSafeDeque<CSVRow>` | ||
| - Exceptions propagate via `std::exception_ptr` | ||
| - Critical: Fields spanning chunk boundaries must not corrupt | ||
|
|
||
| **Testing requirement:** Use ≥500K rows to cross 10MB boundary. | ||
|
|
||
| ## Test Strategy: Use Distinct Column Values | ||
|
|
||
| ❌ **BAD:** `array{i, i, i, i, i}` - All columns identical | ||
| ✅ **GOOD:** `array{i*5+0, i*5+1, i*5+2, i*5+3, i*5+4}` - Each column distinct | ||
|
|
||
| **Why:** Field corruption is only detectable if columns have different values. | ||
| - Tests must use ≥500K rows to cross chunk boundary | ||
|
|
||
| ## Key Files | ||
|
|
||
| | File | Contains | | ||
| |------|----------| | ||
| | `csv_reader.hpp` | Mmap vs stream constructors | | ||
| | `csv_reader.cpp` | Delimiter guessing, header detection | | ||
| | `basic_csv_parser.hpp` | Parser base class (IBasicCSVParser, MmapParser, StreamParser) | | ||
| | `basic_csv_parser.cpp` | Chunk transitions, worker thread | | ||
| | `raw_csv_data.hpp` | Internal parser data structures (RawCSVField, CSVFieldList, RawCSVData) | | ||
| | `thread_safe_deque.hpp` | Producer-consumer queue for parser→main thread communication | | ||
| | `csv_row.hpp` | Public API types (CSVField, CSVRow) | | ||
| | `test_round_trip.cpp` | Exemplar test patterns | | ||
|
|
||
| ## Data Flow: Parser → Row API | ||
|
|
||
| ``` | ||
| Parser Thread Main Thread | ||
| ↓ ↓ | ||
| RawCSVData (shared_ptr) ─────────────→ CSVRow | ||
| ↓ ↓ | ||
| CSVFieldList → RawCSVField[] CSVField (lazy unescaping) | ||
| ↓ | ||
| ThreadSafeDeque<CSVRow> | ||
| (producer-consumer queue) | ||
| ``` | ||
|
|
||
| **Thread Safety:** Parser populates `RawCSVData`, pushes `CSVRow` to `ThreadSafeDeque`, main thread pops and reads. The `CSVFieldList` uses chunked allocation (~170 fields/chunk) for cache locality. See `raw_csv_data.hpp` and `thread_safe_deque.hpp` for implementation details. | ||
| - `csv_reader.hpp` — mmap vs stream constructors | ||
| - `basic_csv_parser.hpp` — MmapParser, StreamParser implementations | ||
| - `basic_csv_parser.cpp` — chunk transitions, worker thread | ||
| - `raw_csv_data.hpp` — RawCSVField, CSVFieldList, RawCSVData | ||
| - `thread_safe_deque.hpp` — producer-consumer queue | ||
| - `csv_row.hpp` — CSVField, CSVRow public API | ||
|
|
||
| ## Common Pitfalls | ||
|
|
||
| 1. **Don't assume one code path:** Mmap and stream paths are different. Always test both. | ||
| 2. **Don't write tiny tests:** Need ≥500K rows to cross 10MB chunk boundary. | ||
| 3. **Don't use uniform values:** Each column needs distinct values to detect corruption. | ||
| 4. **Don't ignore async:** Worker thread means exceptions must use `exception_ptr`. | ||
| 5. **Don't change one constructor:** Likely affects both mmap and stream paths. | ||
|
|
||
| ## Test Checklist | ||
|
|
||
| - [ ] Tests both mmap and stream paths (use `SECTION`) | ||
| - [ ] Distinct values per column | ||
| - [ ] ≥500K rows to cross chunk boundary | ||
| - [ ] Documents bug it would catch | ||
| - [ ] Lambda + SECTION pattern for code reuse | ||
| - [ ] Test data in `tests/data/fake_data` (real data in `tests/data/real_data`) | ||
| - [ ] Use `FileGuard` for temporary files (ensures cleanup even if test fails) | ||
|
|
||
| **Note:** `tests/data` is a git submodule. Remember to commit changes separately. | ||
|
|
||
| ## Recent Bug Fixes | ||
|
|
||
| | Issue | Bug | Fixed | | ||
| |-------|-----|-------| | ||
| | #278 | CSVFieldList move constructor dangling pointer | Feb 2026 | | ||
| | #280 | Field corruption at chunk boundaries | PR #282 | | ||
| | #281 | Stream-specific exception handling | PR #282 | | ||
| | #283 | Header detection with variable-width rows | Jan 2026 | | ||
| | #285 | Delimiter guessing overwrites `no_header()` | Feb 2026 | | ||
|
|
||
| See inline comments in source files for implementation details. | ||
| - Always test both mmap and stream paths | ||
| - ≥500K rows needed to cross 10MB boundary | ||
| - Use distinct column values to detect field corruption | ||
| - Exceptions from worker thread need `exception_ptr` | ||
| - Changes to one constructor likely affect both paths | ||
|
|
||
| ## Tests | ||
| See `tests/AGENTS.md` for full test strategy, checklist, and conventions. |
Uh oh!
There was an error while loading. Please reload this page.