Skip to content

Commit 51c88e0

Browse files
committed
Add test for MSVC compilation w/o CMake
1 parent 46b7c5b commit 51c88e0

7 files changed

Lines changed: 405 additions & 268 deletions

File tree

.github/workflows/cmake-multi-platform.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,3 +93,4 @@ jobs:
9393
run: ctest --build-config ${{ matrix.build_type }}
9494

9595

96+
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
name: MSVC Consumer Compatibility
2+
3+
on:
4+
push:
5+
branches: [ "master" ]
6+
paths:
7+
- "include/**"
8+
- "single_header.py"
9+
- "CMakeLists.txt"
10+
- ".github/workflows/msvc-consumer-compat.yml"
11+
pull_request:
12+
branches: [ "master" ]
13+
paths:
14+
- "include/**"
15+
- "single_header.py"
16+
- "CMakeLists.txt"
17+
- ".github/workflows/msvc-consumer-compat.yml"
18+
workflow_dispatch:
19+
20+
jobs:
21+
windows-msvc-no-zc-cplusplus:
22+
name: Windows MSVC compatibility without /Zc:__cplusplus
23+
runs-on: windows-latest
24+
25+
steps:
26+
- name: Checkout repository and submodules
27+
uses: actions/checkout@v4
28+
with:
29+
submodules: recursive
30+
31+
- name: Set up MSVC developer command prompt
32+
uses: ilammy/msvc-dev-cmd@v1
33+
34+
- name: Set up Python
35+
uses: actions/setup-python@v5
36+
with:
37+
python-version: '3.x'
38+
39+
- name: Configure CMake (for single-header generation)
40+
run: >
41+
cmake -S ${{ github.workspace }}
42+
-B ${{ github.workspace }}/build/no-zc-cplusplus
43+
-DCSV_CXX_STANDARD=20
44+
-DCSV_BUILD_PROGRAMS=OFF
45+
-DBUILD_PYTHON=OFF
46+
47+
- name: Generate amalgamated single-header
48+
run: cmake --build ${{ github.workspace }}/build/no-zc-cplusplus --config Release --target generate_single_header
49+
50+
- name: Compile generated single-header without /Zc:__cplusplus
51+
shell: cmd
52+
run: |
53+
REM NOTE: single_include/csv.hpp is an intentional compatibility shim; compile the generated amalgamated header instead.
54+
echo #include "csv.hpp" > ci_no_zc_single_header.cpp
55+
echo int main() { return 0; } >> ci_no_zc_single_header.cpp
56+
cl /nologo /std:c++20 /EHsc /Zc:__cplusplus- /I build\no-zc-cplusplus\single_include_generated /c ci_no_zc_single_header.cpp
57+
58+
- name: Compile unamalgamated header without /Zc:__cplusplus
59+
shell: cmd
60+
run: |
61+
echo #include "csv.hpp" > ci_no_zc_unamalgamated.cpp
62+
echo int main() { return 0; } >> ci_no_zc_unamalgamated.cpp
63+
cl /nologo /std:c++20 /EHsc /Zc:__cplusplus- /I include /c ci_no_zc_unamalgamated.cpp

AGENTS.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# CSV Parser - AI Agent Context
2+
3+
Architectural overview for AI assistants working with this codebase.
4+
5+
> **Maintenance rule:** Whenever this file is changed, `CLAUDE.md` in the same directory must be updated to reflect the changes. `CLAUDE.md` is a bullet-point summary of this file and must stay in sync.
6+
7+
## Critical: single_include/csv.hpp Is A Shim
8+
9+
`single_include/csv.hpp` is intentionally **non-functional** and exists only as a compatibility shim.
10+
11+
- Do **not** compile against `single_include/csv.hpp`
12+
- For single-header validation, generate `build/.../single_include_generated/csv.hpp` via the `generate_single_header` target, then compile that generated file
13+
- For unamalgamated usage, include headers from `include/`
14+
15+
This guard exists to prevent stale-in-repo amalgamated headers and to force use of the canonical generated distribution.
16+
17+
## Critical: Two Independent Code Paths
18+
19+
The `CSVReader` class has **two completely different implementations**:
20+
21+
```cpp
22+
// PATH 1: Memory-mapped I/O (MmapParser)
23+
CSVReader reader("filename.csv");
24+
25+
// PATH 2: Stream-based (StreamParser)
26+
std::ifstream infile("filename.csv", std::ios::binary);
27+
CSVReader reader(infile, format);
28+
```
29+
30+
**Impact:** Bugs can exist in one path but not the other (see issue #281). Any test validating parsing behavior must test BOTH paths using Catch2 `SECTION`.
31+
32+
## Threading: Worker + 10MB Chunks
33+
34+
- Worker thread reads in 10MB chunks (`ITERATION_CHUNK_SIZE`)
35+
- Communicates via `ThreadSafeDeque<CSVRow>`
36+
- Exceptions propagate via `std::exception_ptr`
37+
- Critical: Fields spanning chunk boundaries must not corrupt
38+
39+
**Testing requirement:** Use ≥500K rows to cross 10MB boundary.
40+
41+
## Key Files
42+
43+
| File | Contains |
44+
|------|----------|
45+
| `csv_reader.hpp` | Mmap vs stream constructors |
46+
| `csv_reader.cpp` | Delimiter guessing, header detection |
47+
| `basic_csv_parser.hpp` | Parser base class (IBasicCSVParser, MmapParser, StreamParser) |
48+
| `basic_csv_parser.cpp` | Chunk transitions, worker thread |
49+
| `raw_csv_data.hpp` | Internal parser data structures (RawCSVField, CSVFieldList, RawCSVData) |
50+
| `thread_safe_deque.hpp` | Producer-consumer queue for parser→main thread communication |
51+
| `csv_row.hpp` | Public API types (CSVField, CSVRow) |
52+
| `test_round_trip.cpp` | Exemplar test patterns |
53+
54+
## Data Flow: Parser → Row API
55+
56+
```
57+
Parser Thread Main Thread
58+
↓ ↓
59+
RawCSVData (shared_ptr) ─────────────→ CSVRow
60+
↓ ↓
61+
CSVFieldList → RawCSVField[] CSVField (lazy unescaping)
62+
63+
ThreadSafeDeque<CSVRow>
64+
(producer-consumer queue)
65+
```
66+
67+
**Thread Safety:** Parser populates `RawCSVData`, pushes `CSVRow` to `ThreadSafeDeque`, main thread pops and reads. The `CSVFieldList` uses chunked allocation (~170 fields/chunk) for cache locality. See `raw_csv_data.hpp` and `thread_safe_deque.hpp` for implementation details.
68+
69+
## Common Pitfalls
70+
71+
1. **Don't assume one code path:** Mmap and stream paths are different. Always test both.
72+
2. **Don't write tiny tests:** Need ≥500K rows to cross 10MB chunk boundary.
73+
3. **Don't use uniform values:** Each column needs distinct values to detect corruption.
74+
4. **Don't ignore async:** Worker thread means exceptions must use `exception_ptr`.
75+
5. **Don't change one constructor:** Likely affects both mmap and stream paths.
76+
77+
See `tests/AGENTS.md` for test strategy, checklist, and conventions.

CLAUDE.md

Lines changed: 28 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -1,94 +1,37 @@
1-
# CSV Parser - AI Agent Context
1+
# CSV Parser - Claude Summary
22

3-
Architectural overview for AI assistants working with this codebase.
3+
> **`AGENTS.md` is the source of truth.** This file is a bullet-point summary only. Always load and follow `AGENTS.md` — it takes precedence over anything here.
44
5-
## Critical: Two Independent Code Paths
5+
## single_include/csv.hpp
6+
- Non-functional shim — do **not** compile against it
7+
- For single-header use: generate `build/.../single_include_generated/csv.hpp` via `generate_single_header` target
8+
- For unamalgamated use: include from `include/`
69

7-
The `CSVReader` class has **two completely different implementations**:
10+
## Two Independent Code Paths
11+
- `CSVReader("file.csv")` → MmapParser
12+
- `CSVReader(istream, format)` → StreamParser
13+
- Bugs can exist in one and not the other — always test both with Catch2 `SECTION`
814

9-
```cpp
10-
// PATH 1: Memory-mapped I/O (MmapParser)
11-
CSVReader reader("filename.csv");
12-
13-
// PATH 2: Stream-based (StreamParser)
14-
std::ifstream infile("filename.csv", std::ios::binary);
15-
CSVReader reader(infile, format);
16-
```
17-
18-
**Impact:** Bugs can exist in one path but not the other (see issue #281). Any test validating parsing behavior must test BOTH paths using Catch2 `SECTION`.
19-
20-
## Threading: Worker + 10MB Chunks
21-
22-
- Worker thread reads in 10MB chunks (`ITERATION_CHUNK_SIZE`)
23-
- Communicates via `ThreadSafeDeque<CSVRow>`
15+
## Threading
16+
- Worker thread reads 10MB chunks (`ITERATION_CHUNK_SIZE`)
17+
- Communication via `ThreadSafeDeque<CSVRow>`
2418
- Exceptions propagate via `std::exception_ptr`
25-
- Critical: Fields spanning chunk boundaries must not corrupt
26-
27-
**Testing requirement:** Use ≥500K rows to cross 10MB boundary.
28-
29-
## Test Strategy: Use Distinct Column Values
30-
31-
❌ **BAD:** `array{i, i, i, i, i}` - All columns identical
32-
✅ **GOOD:** `array{i*5+0, i*5+1, i*5+2, i*5+3, i*5+4}` - Each column distinct
33-
34-
**Why:** Field corruption is only detectable if columns have different values.
19+
- Tests must use ≥500K rows to cross chunk boundary
3520

3621
## Key Files
37-
38-
| File | Contains |
39-
|------|----------|
40-
| `csv_reader.hpp` | Mmap vs stream constructors |
41-
| `csv_reader.cpp` | Delimiter guessing, header detection |
42-
| `basic_csv_parser.hpp` | Parser base class (IBasicCSVParser, MmapParser, StreamParser) |
43-
| `basic_csv_parser.cpp` | Chunk transitions, worker thread |
44-
| `raw_csv_data.hpp` | Internal parser data structures (RawCSVField, CSVFieldList, RawCSVData) |
45-
| `thread_safe_deque.hpp` | Producer-consumer queue for parser→main thread communication |
46-
| `csv_row.hpp` | Public API types (CSVField, CSVRow) |
47-
| `test_round_trip.cpp` | Exemplar test patterns |
48-
49-
## Data Flow: Parser → Row API
50-
51-
```
52-
Parser Thread Main Thread
53-
↓ ↓
54-
RawCSVData (shared_ptr) ─────────────→ CSVRow
55-
↓ ↓
56-
CSVFieldList → RawCSVField[] CSVField (lazy unescaping)
57-
58-
ThreadSafeDeque<CSVRow>
59-
(producer-consumer queue)
60-
```
61-
62-
**Thread Safety:** Parser populates `RawCSVData`, pushes `CSVRow` to `ThreadSafeDeque`, main thread pops and reads. The `CSVFieldList` uses chunked allocation (~170 fields/chunk) for cache locality. See `raw_csv_data.hpp` and `thread_safe_deque.hpp` for implementation details.
22+
- `csv_reader.hpp` — mmap vs stream constructors
23+
- `basic_csv_parser.hpp` — MmapParser, StreamParser implementations
24+
- `basic_csv_parser.cpp` — chunk transitions, worker thread
25+
- `raw_csv_data.hpp` — RawCSVField, CSVFieldList, RawCSVData
26+
- `thread_safe_deque.hpp` — producer-consumer queue
27+
- `csv_row.hpp` — CSVField, CSVRow public API
6328

6429
## Common Pitfalls
65-
66-
1. **Don't assume one code path:** Mmap and stream paths are different. Always test both.
67-
2. **Don't write tiny tests:** Need ≥500K rows to cross 10MB chunk boundary.
68-
3. **Don't use uniform values:** Each column needs distinct values to detect corruption.
69-
4. **Don't ignore async:** Worker thread means exceptions must use `exception_ptr`.
70-
5. **Don't change one constructor:** Likely affects both mmap and stream paths.
71-
72-
## Test Checklist
73-
74-
- [ ] Tests both mmap and stream paths (use `SECTION`)
75-
- [ ] Distinct values per column
76-
- [ ] ≥500K rows to cross chunk boundary
77-
- [ ] Documents bug it would catch
78-
- [ ] Lambda + SECTION pattern for code reuse
79-
- [ ] Test data in `tests/data/fake_data` (real data in `tests/data/real_data`)
80-
- [ ] Use `FileGuard` for temporary files (ensures cleanup even if test fails)
81-
82-
**Note:** `tests/data` is a git submodule. Remember to commit changes separately.
83-
84-
## Recent Bug Fixes
85-
86-
| Issue | Bug | Fixed |
87-
|-------|-----|-------|
88-
| #278 | CSVFieldList move constructor dangling pointer | Feb 2026 |
89-
| #280 | Field corruption at chunk boundaries | PR #282 |
90-
| #281 | Stream-specific exception handling | PR #282 |
91-
| #283 | Header detection with variable-width rows | Jan 2026 |
92-
| #285 | Delimiter guessing overwrites `no_header()` | Feb 2026 |
93-
94-
See inline comments in source files for implementation details.
30+
- Always test both mmap and stream paths
31+
- ≥500K rows needed to cross 10MB boundary
32+
- Use distinct column values to detect field corruption
33+
- Exceptions from worker thread need `exception_ptr`
34+
- Changes to one constructor likely affect both paths
35+
36+
## Tests
37+
See `tests/AGENTS.md` for full test strategy, checklist, and conventions.

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,10 @@ Found a bug? Please report it! This project welcomes **genuine bug reports broug
9393
* ✅ Performance regressions in real-world scenarios
9494
* ✅ API issues that affect **practical, real-world use cases**
9595
96+
When reporting integration or compiler issues, please state which library form you are using:
97+
* Single-header
98+
* Unamalgamated headers/library (`include/` with your own build system, CMake, etc.)
99+
96100
Please keep reports grounded in real use cases—no contrived edge cases or philosophical debates about API design, thanks!
97101
98102
**Design Note:** `CSVReader` uses `std::input_iterator_tag` for single-pass streaming of arbitrarily large files. If you need multi-pass iteration or random access, copy rows to a `std::vector` first. This is by design, not a bug.

0 commit comments

Comments
 (0)