Skip to content

Commit cdc978f

Browse files
Apr 18 2026 Patch (#306)
* Fix issue 195 * Remove useless comments
1 parent 672cdb2 commit cdc978f

19 files changed

Lines changed: 121 additions & 269 deletions

AGENTS.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,7 @@ ThreadSafeDeque<CSVRow>
7373
3. **Don't use uniform values:** Each column needs distinct values to detect corruption.
7474
4. **Don't ignore async:** Worker thread means exceptions must use `exception_ptr`.
7575
5. **Don't change one constructor:** Likely affects both mmap and stream paths.
76-
6. **Don't delete or simplify comments** unless they are trivially obvious (e.g. `// increment i`) or factually incorrect. Comments in this codebase frequently encode concurrency invariants, non-obvious design decisions, and hard-won bug context that cannot be recovered from the code alone.
7776
7. **Compatibility macros defined in `common.hpp` MUST be referenced only after including `common.hpp`.** Any macro (such as `CSV_HAS_CXX20`) that is defined in `common.hpp` must not be used or checked before `#include "common.hpp"` appears in the file. This ensures feature detection and conditional compilation work as intended across all supported compilers and build modes.
78-
7. **Don't reference internal functions in public API comments.** Public API docs should describe user-visible behavior and contracts; internal helper/function details belong in internal docs.
7977
8. **`CSVReader` is non-copyable and move-enabled.** Prefer explicit ownership transfer (`std::move`) or `std::unique_ptr<CSVReader>` when sharing/handing off parser ownership across APIs.
8078
9. **Prefer trailing underscore for private members** (for example `source_`, `leftover_`). When you touch code with mixed private-member naming styles, normalize the edited region toward trailing underscores instead of introducing more leading-underscore or unsuffixed names.
8179
10. **Prefer user-friendly API constraints.** Do not narrow template constraints unless required for correctness, safety, or a measured performance win. If an implementation already handles common standard-library containers/ranges correctly, keep those inputs accepted instead of over-constraining APIs for aesthetic purity.
@@ -86,3 +84,9 @@ ThreadSafeDeque<CSVRow>
8684
15. **If a build fix appears to require more than ~3 files or ~60 changed lines, pause and confirm scope first.** Provide a short justification before expanding further.
8785
8886
See `tests/AGENTS.md` for test strategy, checklist, and conventions.
87+
88+
### Rules for Comments
89+
1. **Always update or remove incorrect comments.**
90+
2. **Don't reference internal functions in public API comments.** Public API docs should describe user-visible behavior and contracts; internal helper/function details belong in internal docs.
91+
3. **Avoid meaningless @param and @return descriptions.** Do not add comments that could trivially be inferred by the function's name or other existing comments. When editing a function, remove any @param/@return descriptions that merely restate the function name or signature.
92+
4. **Don't delete or simplify comments** unless allowed by other rules in this section. Comments in this codebase frequently encode concurrency invariants, non-obvious design decisions, and hard-won bug context that cannot be recovered from the code alone.

ARCHITECTURE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@ Operational/testing guidance:
1515
Notes:
1616
- Internal architecture content lives under include/internal to stay close to implementation.
1717
- Queue synchronization details are maintained only in THREADSAFE_DEQUE_DESIGN.md to avoid duplication.
18+
- Always update or remove incorrect comments.
1819
- Public API comments should remain user-facing and avoid references to internal helper/function details.
20+
- When editing a function, remove `@param` and `@return` descriptions that merely restate the function name or signature.
1921
- Private member naming should prefer trailing underscores; when editing mixed-style code, normalize the touched region toward that convention.
2022
- Compatibility macros defined in `common.hpp` must only be referenced after including `common.hpp`. See AGENTS.md and CLAUDE.md for details.
2123
- API constraints should be user-friendly: do not over-constrain templates unless needed for correctness, safety, or a measured performance win.

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,11 @@
3535
- Use distinct column values to detect field corruption
3636
- Exceptions from worker thread need `exception_ptr`
3737
- Changes to one constructor likely affect both paths
38+
- **Always update or remove incorrect comments**
3839
- **Do not delete or simplify comments** unless trivially obvious or factually wrong — comments encode concurrency invariants and bug history
3940
- **Compatibility macros defined in `common.hpp` MUST be referenced only after including `common.hpp`.** Any macro (such as `CSV_HAS_CXX20`) that is defined in `common.hpp` must not be used or checked before `#include "common.hpp"` appears in the file. This ensures feature detection and conditional compilation work as intended across all supported compilers and build modes.
4041
- **Do not reference internal functions in public API comments** — public API docs should remain user-facing; internal details belong in internal docs
42+
- **Remove meaningless `@param` and `@return` docs when editing a function** — if they merely restate the name or signature, delete them instead of preserving noise
4143
- **`CSVReader` is non-copyable and move-enabled** — prefer explicit ownership transfer (`std::move`) or `std::unique_ptr<CSVReader>` when handing off parser ownership
4244
- **Prefer trailing underscore for private members** — when touching mixed-style code, normalize the edited region toward names like `source_` and `leftover_`
4345
- **Prefer user-friendly API constraints** — do not narrow template constraints unless required for correctness, safety, or a measured performance win; if common containers/ranges already work, keep them accepted

include/internal/basic_csv_parser.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ namespace csv {
140140
field_length_ = 0;
141141
}
142142

143-
/** @return The number of characters parsed that belong to complete rows */
143+
/** Parse the current chunk and return the number of bytes belonging to complete rows. */
144144
CSV_INLINE size_t IBasicCSVParser::parse()
145145
{
146146
using internals::ParseFlags;

include/internal/basic_csv_parser.hpp

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -178,10 +178,7 @@ namespace csv {
178178
/** Whether or not source needs to be read in chunks */
179179
CONSTEXPR bool no_chunk() const { return this->source_size_ < CSV_CHUNK_SIZE_DEFAULT; }
180180

181-
/** Parse the current chunk of data *
182-
*
183-
* @returns How many character were read that are part of complete rows
184-
*/
181+
/** Parse the current chunk of data and return the completed-row prefix length. */
185182
size_t parse();
186183

187184
/** Create a new RawCSVDataPtr for a new chunk of data */

include/internal/csv_format.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ namespace csv {
4343
if (row < 0) this->variable_column_policy = VariableColumnPolicy::KEEP;
4444

4545
this->header = row;
46+
this->header_explicitly_set_ = true;
4647
this->col_names = {};
4748
return *this;
4849
}

include/internal/csv_format.hpp

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,14 +53,12 @@ namespace csv {
5353
/** Sets a list of potential delimiters
5454
*
5555
* @throws `std::runtime_error` thrown if trim, quote, or possible delimiting characters overlap
56-
* @param[in] delim An array of possible delimiters to try parsing the CSV with
5756
*/
5857
CSVFormat& delimiter(const std::vector<char> & delim);
5958

6059
/** Sets the whitespace characters to be trimmed
6160
*
6261
* @throws `std::runtime_error` thrown if trim, quote, or possible delimiting characters overlap
63-
* @param[in] ws An array of whitespace characters that should be trimmed
6462
*/
6563
CSVFormat& trim(const std::vector<char> & ws);
6664

@@ -156,8 +154,11 @@ namespace csv {
156154
CSV_INLINE static CSVFormat guess_csv() {
157155
CSVFormat format;
158156
format.delimiter({ ',', '|', '\t', ';', '^' })
159-
.quote('"')
160-
.header_row(0);
157+
.quote('"');
158+
// Assign header directly rather than via header_row() so that
159+
// header_explicitly_set_ remains false — the guesser must be free
160+
// to detect the real header row at construction time.
161+
format.header = 0;
161162

162163
return format;
163164
}
@@ -182,6 +183,9 @@ namespace csv {
182183
/**< Row number with columns (ignored if col_names is non-empty) */
183184
int header = 0;
184185

186+
/**< True if the user explicitly called header_row() or no_header() */
187+
bool header_explicitly_set_ = false;
188+
185189
/**< Whether or not to use quoting */
186190
bool no_quote = false;
187191

include/internal/csv_reader.cpp

Lines changed: 8 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,7 @@ namespace csv {
1919
return ret.str();
2020
}
2121

22-
/** Return a CSV's column names
23-
*
24-
* @param[in] filename Path to CSV file
25-
* @param[in] format Format of the CSV file
26-
*
27-
*/
22+
/** Return the selected header row from a parsed head buffer. */
2823
CSV_INLINE std::vector<std::string> _get_col_names(csv::string_view head, CSVFormat format) {
2924
// Parse the CSV
3025
auto trim_chars = format.get_trim_chars();
@@ -129,12 +124,7 @@ namespace csv {
129124
}
130125
}
131126

132-
/** Return a CSV's column names
133-
*
134-
* @param[in] filename Path to CSV file
135-
* @param[in] format Format of the CSV file
136-
*
137-
*/
127+
/** Return a CSV's column names. */
138128
CSV_INLINE std::vector<std::string> get_col_names(csv::string_view filename, CSVFormat format) {
139129
auto head = internals::get_csv_head(filename);
140130

@@ -158,9 +148,6 @@ namespace csv {
158148
* **Details:** Reads the first block of a CSV file synchronously to get information
159149
* such as column names and delimiting character.
160150
*
161-
* @param[in] filename Path to CSV file
162-
* @param[in] format Format of the CSV file
163-
*
164151
* \snippet tests/test_read_csv.cpp CSVField Example
165152
*
166153
*/
@@ -184,10 +171,11 @@ namespace csv {
184171
if (format.guess_delim()) {
185172
auto guess_result = internals::_guess_format(head, format.possible_delimiters);
186173
format.delimiter(guess_result.delim);
187-
// Only override header if user hasn't explicitly called no_header()
188-
// Note: column_names() also sets header=-1, but it populates col_names,
189-
// so we can distinguish: no_header() means header=-1 && col_names.empty()
190-
if (format.header != -1 || !format.col_names.empty()) {
174+
// Only override header if the user hasn't explicitly set one via
175+
// header_row() or no_header(). column_names() sets col_names but
176+
// does not set header_explicitly_set_, so the guesser can still
177+
// determine where the data rows begin in that case.
178+
if (!format.header_explicitly_set_) {
191179
format.header = guess_result.header_row;
192180
}
193181

@@ -246,9 +234,7 @@ namespace csv {
246234
}
247235
}
248236

249-
/**
250-
* @param[in] names Column names
251-
*/
237+
/** Install the active column names for this reader. */
252238
CSV_INLINE void CSVReader::set_col_names(const std::vector<std::string>& names)
253239
{
254240
this->col_names->set_policy(this->_format.get_column_name_policy());
@@ -262,8 +248,6 @@ namespace csv {
262248
* @note This method is meant to be run on its own thread. Only one `read_csv()` thread
263249
* should be active at a time.
264250
*
265-
* @param[in] bytes Number of bytes to read.
266-
*
267251
* @see CSVReader::read_csv_worker
268252
* @see CSVReader::read_row()
269253
*/
@@ -309,7 +293,6 @@ namespace csv {
309293
* - Reads chunks of data that are csv::internals::CSV_CHUNK_SIZE_DEFAULT bytes large at a time
310294
* - For performance details, read the documentation for CSVRow and CSVField.
311295
*
312-
* @param[out] row The variable where the parsed row will be stored
313296
* @see CSVRow, CSVField
314297
*
315298
* **Example:**

include/internal/csv_reader.hpp

Lines changed: 6 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,6 @@ namespace csv {
4747
const CSVFormat format = CSVFormat::guess_csv());
4848

4949
/** @brief Guess the delimiter and header row of a CSV file
50-
*
51-
* @param[in] filename Path to CSV file
52-
* @param[in] delims Candidate delimiters to test
53-
* @return CSVGuessResult containing the detected delimiter and header row index
5450
*
5551
* **Heuristic:** For each candidate delimiter, calculate a score based on
5652
* the most common row length (mode). The delimiter with the highest score wins.
@@ -107,28 +103,23 @@ namespace csv {
107103
* streaming large CSV files that may exceed available RAM (e.g., 50+ GB files).
108104
*
109105
* @par CRITICAL DESIGN CONSTRAINT:
110-
* - The underlying RawCSVData chunks are automatically freed as the iterator advances
111-
* - Only the CURRENT row's data is kept alive (via CSVRow's RawCSVDataPtr member)
112-
* - Previous chunks are freed to allow unlimited file sizes with bounded memory
106+
* - Storage for previously consumed positions may be released as the iterator advances
107+
* - Only the current position is guaranteed to remain valid without copying
108+
* - This bounded-memory behavior is what allows very large CSV files to stream safely
113109
*
114110
* @par WHY FORWARD ITERATOR IS NOT POSSIBLE:
115111
* - ForwardIterator requires multi-pass guarantees (can hold multiple valid positions)
116-
* - Supporting this would require caching ALL RawCSVDataPtr chunks in the iterator
112+
* - Supporting this would require retaining all parsed row storage in memory
117113
* - This defeats the streaming purpose: a 50 GB CSV would require 50+ GB of RAM
118114
* - The entire library design depends on automatic chunk cleanup for memory efficiency
119115
*
120116
* @par IMPLICATIONS FOR ALGORITHM USE:
121117
* - Algorithms requiring ForwardIterator (std::max_element, std::sort, etc.) may
122-
* appear to work in tests with small files, but will cause heap-use-after-free
123-
* when the algorithm tries to access iterators pointing to freed chunks
118+
* appear to work in tests with small files, but are not safe on this iterator
119+
* once earlier positions have been released during streaming
124120
* - CORRECT approach: Copy rows to std::vector first, then use algorithms
125121
* - Example: auto rows = std::vector<CSVRow>(reader.begin(), reader.end());
126122
* auto max_row = std::max_element(rows.begin(), rows.end(), ...);
127-
*
128-
* @par FOR AI AGENTS AND MAINTAINERS:
129-
* If you encounter heap-use-after-free with ForwardIterator algorithms, the solution
130-
* is NOT to cache all RawCSVDataPtr chunks - this violates the fundamental streaming
131-
* architecture. Instead, document the limitation or provide vector-based workarounds.
132123
*/
133124
class iterator {
134125
public:

include/internal/csv_reader_iterator.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ namespace csv {
1414
return CSVReader::iterator(this, std::move(row));
1515
}
1616

17-
/** A placeholder for the imaginary past the end row in a CSV.
18-
* Attempting to deference this will lead to bad things.
17+
/** A placeholder for the imaginary past-the-end row in a CSV.
18+
* Attempting to dereference this iterator is undefined.
1919
*/
2020
CSV_INLINE CSV_CONST CSVReader::iterator CSVReader::end() const noexcept {
2121
return CSVReader::iterator();

0 commit comments

Comments
 (0)