You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Unify inference logic; add explicit column_names flag; fix stream/mmap parity regressions
* Clean up MSVC compatibility macros
* Simplify IBasicCSVParser construction
* Refactor get_col_names()
Use parse() and move to utility header
* Get rid of get_file_size()
This returned the file size of an mmap source for the MmapParser, but used ifstream. Not efficient!
* Delete sh.hpp
* Cleanup constructors for CSVReader & IBasicCSVParser
* Clean up CSVField duplication
* Figured out what was wrong with clang & CSVWriter
Also add variadic write_row() method
* Fix MSVC pedantry
* Potential fix for pull request finding 'CodeQL / Large object passed by value'
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Potential fix for pull request finding 'CodeQL / Large object passed by value'
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Update writing documentation
* Fix clanker-induced error
Also fix shadowing issues
* Fix CSV writing issues
* Bump version
* Delete sh.hpp
* Reduce duplication in get_csv_head_stream()
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
**Thread Safety:** Parser populates `RawCSVData`, pushes `CSVRow` to `ThreadSafeDeque`, main thread pops and reads. The `CSVFieldList` uses chunked allocation (~170 fields/chunk) for cache locality. See `raw_csv_data.hpp` and `thread_safe_deque.hpp` for implementation details.
41
+
For detailed file mapping, parser data flow, and component relationships, see `ARCHITECTURE.md` and `include/internal/ARCHITECTURE.md`.
68
42
69
43
## Common Pitfalls
70
44
@@ -73,18 +47,29 @@ ThreadSafeDeque<CSVRow>
73
47
3. **Don't use uniform values:** Each column needs distinct values to detect corruption.
74
48
4. **Don't ignore async:** Worker thread means exceptions must use `exception_ptr`.
75
49
5. **Don't change one constructor:** Likely affects both mmap and stream paths.
76
-
7. **Compatibility macros defined in `common.hpp` MUST be referenced only after including `common.hpp`.** Any macro (such as `CSV_HAS_CXX20`) that is defined in `common.hpp` must not be used or checked before `#include "common.hpp"` appears in the file. This ensures feature detection and conditional compilation work as intended across all supported compilers and build modes.
77
-
8. **`CSVReader` is non-copyable and move-enabled.** Prefer explicit ownership transfer (`std::move`) or `std::unique_ptr<CSVReader>` when sharing/handing off parser ownership across APIs.
78
-
9. **Prefer trailing underscore for private members** (for example `source_`, `leftover_`). When you touch code with mixed private-member naming styles, normalize the edited region toward trailing underscores instead of introducing more leading-underscore or unsuffixed names.
79
-
10. **Prefer user-friendly API constraints.** Do not narrow template constraints unless required for correctness, safety, or a measured performance win. If an implementation already handles common standard-library containers/ranges correctly, keep those inputs accepted instead of over-constraining APIs for aesthetic purity.
80
-
11. **Respect existing compile-time compatibility macros.** Keep `IF_CONSTEXPR`, `CONSTEXPR_VALUE`, and similar macros unless there is a correctness bug.
81
-
12. **Do not replace compile-time constructs with runtime control flow to silence warnings.** Prefer smallest scoped warning suppression at the exact site (for example, local `#pragma warning(push/pop)` on MSVC) over semantic rewrites.
82
-
13. **Opportunistic rewrites/refactors are allowed when they are safe and justified.** Keep them separated from build-fix urgency where possible, and avoid bundling unrelated churn with compiler triage unless explicitly requested.
83
-
14. **When proposing changes that affect compile-time behavior, explain the tradeoff clearly.** Call out any impact to codegen, performance, portability, and readability before applying the change.
84
-
15. **If a build fix appears to require more than ~3 files or ~60 changed lines, pause and confirm scope first.** Provide a short justification before expanding further.
50
+
6. **`CSVReader` is non-copyable and move-enabled.** Prefer explicit ownership transfer (`std::move`) or `std::unique_ptr<CSVReader>` when sharing/handing off parser ownership across APIs.
51
+
7. **Prefer user-friendly API constraints.** Do not narrow template constraints unless required for correctness, safety, or a measured performance win. If an implementation already handles common standard-library containers/ranges correctly, keep those inputs accepted instead of over-constraining APIs for aesthetic purity.
52
+
8. **Opportunistic rewrites/refactors are allowed when they are safe and justified.** Keep them separated from build-fix urgency where possible, and avoid bundling unrelated churn with compiler triage unless explicitly requested.
53
+
9. **When proposing changes that affect compile-time behavior, explain the tradeoff clearly.** Call out any impact to codegen, performance, portability, and readability before applying the change.
54
+
10. **If a build fix appears to require more than ~3 files or ~60 changed lines, pause and confirm scope first.** Provide a short justification before expanding further.
85
55
86
56
See `tests/AGENTS.md` for test strategy, checklist, and conventions.
87
57
58
+
### Rules for Coding
59
+
1. **Use compatibility macros defined in `common.hpp`** for cross-compiler or cross-standard concerns. If it doesn't exist, consider creating one.
60
+
2. **Compatibility macros defined in `common.hpp` MUST be referenced only after including `common.hpp`** to ensure correctness.
61
+
3. **Prefer compile time control flow and assertions where possible**. For example, if a branch may be safely written with `if constexpr`, then use the `IF_CONSTEXPR` macro (from `common.hpp`) to ensure C++11 compatibility while ensuring optimal control flow for C++17 and later users.
62
+
1. **If this causes compiler warnings, always silence the compiler. Do not revert to unnecessary runtime flow.**
63
+
4. **Prefer trailing underscore for private members** (for example `source_`, `leftover_`). When you touch code with mixed private-member naming styles, normalize the edited region toward trailing underscores instead of introducing more leading-underscore or unsuffixed names.
64
+
5. **Apply the 5/2 anti-duplication rule.**
65
+
1. If equivalent behavior exists in 2 or more code paths and each copy is about 5+ meaningful lines, extract a shared helper.
66
+
2. If duplication is intentionally kept, add a brief comment explaining why (for example performance, API boundary, or template constraints).
67
+
3. For behavior-sensitive duplicated logic, keep at least one regression test that exercises each path (for example mmap and stream via separate Catch2 `SECTION`s).
68
+
6. If a class has both a `.hpp` and `.cpp` file, put methods inside the `.cpp` and prefix the definition with `CSV_INLINE` to ensure proper single-header compilation (the macro is `inline` in the generated single-header and empty otherwise). Exceptions:
69
+
- **Templates must stay in `.hpp`** — the compiler needs the definition at instantiation time. `init_from_stream` is the standing example.
70
+
- **Trivial one-liner accessors** may be unconditionally `inline` in the header when the call overhead is measurable and the body will never change.
71
+
- **Consolidation:** If a `.cpp` would be under ~100 lines *and* the split causes excessive comment duplication between the two files, prefer a single `.hpp` with definitions marked `inline` (free functions and methods alike). Do not use `CSV_INLINE` for consolidated definitions — `CSV_INLINE` expands to empty in multi-header mode, which would produce ODR violations across TUs. Do not consolidate just for brevity — only when duplication is the dominant cost.
72
+
88
73
### Rules for Comments
89
74
1. **Always update or remove incorrect comments.**
90
75
2. **Don't reference internal functions in public API comments.** Public API docs should describe user-visible behavior and contracts; internal helper/function details belong in internal docs.
- If a build fix appears to require more than ~3 files or ~60 changed lines, pause and confirm scope first.
31
+
- Apply the 5/2 anti-duplication rule: if equivalent behavior exists in 2+ code paths and each copy is ~5+ meaningful lines, extract a shared helper; if duplication remains, document why and keep regression coverage for each path.
- File mapping, parser data flow, and component relationships are maintained in `ARCHITECTURE.md` and `include/internal/ARCHITECTURE.md`
31
26
32
27
## Common Pitfalls
33
28
- Always test both mmap and stream paths
@@ -48,6 +43,8 @@
48
43
-**Opportunistic rewrites are allowed when safe and justified** — avoid mixing unrelated churn into urgent compiler triage unless requested
49
44
-**Explain compile-time tradeoffs explicitly** — when a change affects compile-time behavior, call out impact on codegen/perf/portability/readability
50
45
-**Scope guard for build fixes** — if a fix grows beyond roughly 3 files or 60 changed lines, pause and confirm scope with justification
46
+
-**Apply the 5/2 anti-duplication rule** — if equivalent behavior exists in 2+ code paths and each copy is ~5+ meaningful lines, extract a shared helper; if duplication remains, document why; keep at least one regression test that exercises each path
47
+
-**Non-trivial methods go in `.cpp` with `CSV_INLINE`** — `CSV_INLINE` is `inline` in the generated single-header and empty otherwise; omitting it causes ODR violations. Exceptions: templated methods must stay in `.hpp` (`init_from_stream` is the standing example); trivial one-liner accessors may stay `inline` in the header when call overhead matters. Consolidate into a single `.hpp` only when the `.cpp` would be under ~100 lines *and* the split causes excessive comment duplication — consolidated definitions (free functions and methods alike) must use `inline`, not `CSV_INLINE`, to avoid ODR violations across TUs.
51
48
52
49
## Tests
53
50
See `tests/AGENTS.md` for full test strategy, checklist, and conventions.
*For a more in-depth guide, check out the [Doxygen page on CSV writing](https://vincentlaucsb.github.io/csv-parser/csv_writing_guide.html).*
587
+
580
588
Writing CSVs is powered by the generic `DelimWriter`, with helpful factory functions like `make_csv_writer()` and `make_tsv_writer()` that cut down on boilerplate.
Copy file name to clipboardExpand all lines: docs/source/csv_writing.md
+30-1Lines changed: 30 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,5 @@
1
+
@page csv_writing_guide CSV Writing Guide
2
+
1
3
# CSV Writing Guide
2
4
3
5
This page summarizes write-side APIs and practical usage patterns for emitting
@@ -18,13 +20,21 @@ Any row-like container of string-convertible values can be streamed directly.
18
20
19
21
\snippet tests/test_write_csv.cpp CSV Writer Example
20
22
21
-
## Writing Tuples and Custom Types
23
+
###Writing Tuples and Custom Types
22
24
23
25
`DelimWriter` can also serialize tuples and custom types that provide a string
24
26
conversion.
25
27
26
28
\snippet tests/test_write_csv.cpp CSV Writer Tuple Example
27
29
30
+
## Using `write_row()`
31
+
32
+
The `write_row()` method can be used to write rows with arbitrary fields and mixed types without having to construct a container first.
33
+
34
+
Through the magic of SFINAE, `write_row()` also supports any of the operations of `operator<<`.
35
+
36
+
\snippet tests/test_write_csv.cpp CSV write_row Variadic Example
37
+
28
38
## Data Reordering Workflow
29
39
30
40
For read-transform-write pipelines, `csv::CSVRow` supports conversion to
@@ -39,3 +49,22 @@ Typical flow:
39
49
4. Emit with `CSVWriter`
40
50
41
51
\snippet tests/test_write_csv.cpp CSV Reordering Example
52
+
53
+
### C++20 Ranges Version
54
+
55
+
With C++20, you can use `std::ranges::views` to elegantly reorder fields in a single expression:
56
+
57
+
\snippet tests/test_write_csv.cpp CSV Ranges Reordering Example
58
+
59
+
## DataFrame with Sparse Overlay
60
+
61
+
When working with DataFrames, you can efficiently update specific cells without reconstructing entire rows. The overlay mechanism stores only the changed cells and writes them correctly:
62
+
63
+
\snippet tests/test_write_csv.cpp DataFrame Sparse Overlay Write Example
64
+
65
+
## End-to-End Round-Trip Integrity Example
66
+
67
+
The following test is intentionally write-first then read/verify, but it validates
68
+
the same data-integrity guarantee as read-transform-write user workflows.
69
+
70
+
\snippet tests/test_round_trip.cpp Round Trip Distinct Field Values Example
0 commit comments