Architectural overview for AI assistants working with this codebase.
Maintenance rule: Whenever this file is changed,
CLAUDE.mdin the same directory must be updated to reflect the changes.CLAUDE.mdis a bullet-point summary of this file and must stay in sync.
single_include/csv.hpp is intentionally non-functional and exists only as a compatibility shim.
- Do not compile against
single_include/csv.hpp - For single-header validation, generate
build/.../single_include_generated/csv.hppvia thegenerate_single_headertarget, then compile that generated file - For unamalgamated usage, include headers from
include/
This guard exists to prevent stale-in-repo amalgamated headers and to force use of the canonical generated distribution.
The CSVReader class has two completely different implementations:
// PATH 1: Memory-mapped I/O (MmapParser)
CSVReader reader("filename.csv");
// PATH 2: Stream-based (StreamParser)
std::ifstream infile("filename.csv", std::ios::binary);
CSVReader reader(infile, format);Impact: Bugs can exist in one path but not the other (see issue #281). Any test validating parsing behavior must test BOTH paths using Catch2 SECTION.
- Worker thread reads in 10MB chunks (
ITERATION_CHUNK_SIZE) - Communicates via
ThreadSafeDeque<CSVRow> - Exceptions propagate via
std::exception_ptr - Critical: Fields spanning chunk boundaries must not corrupt
Testing requirement: Use ≥500K rows to cross 10MB boundary.
| File | Contains |
|---|---|
csv_reader.hpp |
Mmap vs stream constructors |
csv_reader.cpp |
Delimiter guessing, header detection |
basic_csv_parser.hpp |
Parser base class (IBasicCSVParser, MmapParser, StreamParser) |
basic_csv_parser.cpp |
Chunk transitions, worker thread |
raw_csv_data.hpp |
Internal parser data structures (RawCSVField, CSVFieldList, RawCSVData) |
thread_safe_deque.hpp |
Producer-consumer queue for parser→main thread communication |
csv_row.hpp |
Public API types (CSVField, CSVRow) |
test_round_trip.cpp |
Exemplar test patterns |
Parser Thread Main Thread
↓ ↓
RawCSVData (shared_ptr) ─────────────→ CSVRow
↓ ↓
CSVFieldList → RawCSVField[] CSVField (lazy unescaping)
↓
ThreadSafeDeque<CSVRow>
(producer-consumer queue)
Thread Safety: Parser populates RawCSVData, pushes CSVRow to ThreadSafeDeque, main thread pops and reads. The CSVFieldList uses chunked allocation (~170 fields/chunk) for cache locality. See raw_csv_data.hpp and thread_safe_deque.hpp for implementation details.
- Don't assume one code path: Mmap and stream paths are different. Always test both.
- Don't write tiny tests: Need ≥500K rows to cross 10MB chunk boundary.
- Don't use uniform values: Each column needs distinct values to detect corruption.
- Don't ignore async: Worker thread means exceptions must use
exception_ptr. - Don't change one constructor: Likely affects both mmap and stream paths.
See tests/AGENTS.md for test strategy, checklist, and conventions.