Skip to content

Commit 539b6df

Browse files
Added StringViewStream to enable less copying when parsing string views (#303)
* Added StringViewStream * Added opt-in footgun * Added opt-in safety switch for StreamParser path * Fix single header issue * Updated comments on CSVReader ownership * Reverted parse(). Added parse_unsafe() for non-owning SVs. * Potential fix for pull request finding 'CodeQL / Large object passed by value' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
1 parent 6eb125d commit 539b6df

12 files changed

Lines changed: 232 additions & 18 deletions

AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,5 +74,7 @@ ThreadSafeDeque<CSVRow>
7474
4. **Don't ignore async:** Worker thread means exceptions must use `exception_ptr`.
7575
5. **Don't change one constructor:** Likely affects both mmap and stream paths.
7676
6. **Don't delete or simplify comments** unless they are trivially obvious (e.g. `// increment i`) or factually incorrect. Comments in this codebase frequently encode concurrency invariants, non-obvious design decisions, and hard-won bug context that cannot be recovered from the code alone.
77+
7. **Don't reference internal functions in public API comments.** Public API docs should describe user-visible behavior and contracts; internal helper/function details belong in internal docs.
78+
8. **`CSVReader` is non-copyable and non-movable.** The preferred idiom for sharing or transferring a reader is `std::unique_ptr<CSVReader>`. Document this at any API surface that might tempt callers to copy or move.
7779
7880
See `tests/AGENTS.md` for test strategy, checklist, and conventions.

ARCHITECTURE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,5 @@ Operational/testing guidance:
1515
Notes:
1616
- Internal architecture content lives under include/internal to stay close to implementation.
1717
- Queue synchronization details are maintained only in THREADSAFE_DEQUE_DESIGN.md to avoid duplication.
18+
- Public API comments should remain user-facing and avoid references to internal helper/function details.
1819

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@
3636
- Exceptions from worker thread need `exception_ptr`
3737
- Changes to one constructor likely affect both paths
3838
- **Do not delete or simplify comments** unless trivially obvious or factually wrong — comments encode concurrency invariants and bug history
39+
- **Do not reference internal functions in public API comments** — public API docs should remain user-facing; internal details belong in internal docs
40+
- **`CSVReader` is non-copyable and non-movable** — the preferred sharing/transfer idiom is `std::unique_ptr<CSVReader>`
3941

4042
## Tests
4143
See `tests/AGENTS.md` for full test strategy, checklist, and conventions.

include/csv.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
CSV for C++, version 3.2.0
2+
CSV for C++, version 3.3.0
33
https://github.com/vincentlaucsb/csv-parser
44
55
MIT License

include/internal/CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,9 @@ target_sources(csv
2525
raw_csv_data.cpp
2626
row_deque.hpp
2727
single_thread_deque.hpp
28+
string_view_stream.hpp
2829
thread_safe_deque.hpp
29-
)
30+
)
3031

3132
set_target_properties(csv PROPERTIES LINKER_LANGUAGE CXX)
3233

include/internal/csv_reader.cpp

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -166,12 +166,15 @@ namespace csv {
166166
*/
167167
CSV_INLINE CSVReader::CSVReader(csv::string_view filename, CSVFormat format) : _format(format) {
168168
#if defined(__EMSCRIPTEN__)
169-
this->owned_file_stream = std::unique_ptr<std::ifstream>(new std::ifstream(std::string(filename), std::ios::binary));
170-
if (!this->owned_file_stream->is_open()) {
169+
this->owned_stream = std::unique_ptr<std::istream>(
170+
new std::ifstream(std::string(filename), std::ios::binary)
171+
);
172+
173+
if (!(*this->owned_stream)) {
171174
throw std::runtime_error("Cannot open file " + std::string(filename));
172175
}
173176

174-
this->init_from_stream(*this->owned_file_stream, format);
177+
this->init_from_stream(*this->owned_stream, format);
175178
#else
176179
auto head = internals::get_csv_head(filename);
177180
using Parser = internals::MmapParser;

include/internal/csv_reader.hpp

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,14 @@ namespace csv {
7575
* All rows are compared to the column names for length consistency
7676
* - By default, rows that are too short or too long are dropped
7777
* - Custom behavior can be defined by overriding bad_row_handler in a subclass
78+
*
79+
* **Ownership and sharing:** CSVReader is neither copyable nor movable because it
80+
* manages a live parsing state (worker thread, internal queue, and an optional stream
81+
* reference). To share or transfer a reader, wrap it in a `std::unique_ptr<CSVReader>`:
82+
* @code{.cpp}
83+
* auto reader = std::make_unique<csv::CSVReader>("data.csv");
84+
* process(std::move(reader)); // transfer ownership
85+
* @endcode
7886
*/
7987
class CSVReader {
8088
public:
@@ -186,6 +194,20 @@ namespace csv {
186194
CSVReader(TStream &source, CSVFormat format = CSVFormat::guess_csv()) : _format(format) {
187195
this->init_from_stream(source, format);
188196
}
197+
198+
/** @brief Construct CSVReader from an owned std::istream
199+
*
200+
* This is an opt-in safety switch for stream lifetime management.
201+
* CSVReader takes ownership and guarantees the stream outlives parsing.
202+
*/
203+
CSVReader(std::unique_ptr<std::istream> source,
204+
const CSVFormat& format = CSVFormat::guess_csv()) : _format(format), owned_stream(std::move(source)) {
205+
if (!this->owned_stream) {
206+
throw std::invalid_argument("CSVReader requires a non-null stream");
207+
}
208+
209+
this->init_from_stream(*this->owned_stream, format);
210+
}
189211
///@}
190212

191213
CSVReader(const CSVReader&) = delete; ///< Not copyable
@@ -261,10 +283,12 @@ namespace csv {
261283
/** Queue of parsed CSV rows */
262284
std::unique_ptr<RowCollection> records{new RowCollection(100)};
263285

264-
#if defined(__EMSCRIPTEN__)
265-
/** Owned file stream used by filename constructor fallback to stream parsing. */
266-
std::unique_ptr<std::ifstream> owned_file_stream = nullptr;
267-
#endif
286+
/**
287+
* Optional owned stream used by two paths:
288+
* 1) Emscripten filename-constructor fallback to stream parsing
289+
* 2) Opt-in ownership constructor taking std::unique_ptr<std::istream>
290+
*/
291+
std::unique_ptr<std::istream> owned_stream = nullptr;
268292

269293
size_t n_cols = 0; /**< The number of columns in this CSV */
270294
size_t _n_rows = 0; /**< How many rows (minus header) have been read so far */

include/internal/csv_row.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
namespace csv {
1010
namespace internals {
11-
csv::string_view get_trimmed(csv::string_view sv, const WhitespaceMap& ws_flags) noexcept
11+
CSV_INLINE csv::string_view get_trimmed(csv::string_view sv, const WhitespaceMap& ws_flags) noexcept
1212
{
1313
// Lazy trim only when requested
1414
size_t start = 0;

include/internal/csv_utility.cpp

Lines changed: 37 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,40 @@
11
#include <sstream>
22
#include <vector>
3+
#include <memory>
34

45
#include "csv_utility.hpp"
6+
#include "string_view_stream.hpp"
57

68
namespace csv {
7-
/** Shorthand function for parsing an in-memory CSV string
9+
/** Shorthand function for parsing an in-memory CSV string.
10+
*
11+
* Copies the input into an owned stringstream, so the caller's backing
12+
* memory may be freed immediately after this call returns.
813
*
914
* @return A collection of CSVRow objects
1015
*
1116
* @par Example
1217
* @snippet tests/test_read_csv.cpp Parse Example
1318
*/
1419
CSV_INLINE CSVReader parse(csv::string_view in, CSVFormat format) {
15-
std::stringstream stream(std::string(in.data(), in.length()));
16-
return CSVReader(stream, format);
20+
std::unique_ptr<std::istream> ss(new std::stringstream(std::string(in)));
21+
return CSVReader(std::move(ss), format);
22+
}
23+
24+
/** Parse CSV from an in-memory view with zero copy.
25+
*
26+
* Creates a non-owning stream adapter over the provided string_view.
27+
* The caller is responsible for keeping backing memory valid and immutable
28+
* while CSVReader may request additional rows.
29+
*
30+
* Already materialized CSVRows remain safe because parsed chunk data is
31+
* owned by RawCSVData.
32+
*
33+
* @return A collection of CSVRow objects
34+
*/
35+
CSV_INLINE CSVReader parse_unsafe(csv::string_view in, CSVFormat format) {
36+
std::unique_ptr<std::istream> stream(new internals::StringViewStream(in));
37+
return CSVReader(std::move(stream), format);
1738
}
1839

1940
/** Parses a CSV string with no headers
@@ -28,19 +49,28 @@ namespace csv {
2849
}
2950

3051
/** Parse a RFC 4180 CSV string, returning a collection
31-
* of CSVRow objects
52+
* of CSVRow objects.
53+
*
54+
* String literals have static storage duration, so the zero-copy path is
55+
* safe here.
3256
*
3357
* @par Example
3458
* @snippet tests/test_read_csv.cpp Escaped Comma
3559
*
3660
*/
3761
CSV_INLINE CSVReader operator ""_csv(const char* in, size_t n) {
38-
return parse(csv::string_view(in, n));
62+
return parse_unsafe(csv::string_view(in, n));
3963
}
4064

41-
/** A shorthand for csv::parse_no_header() */
65+
/** A shorthand for csv::parse_no_header().
66+
*
67+
* String literals have static storage duration, so the zero-copy path is
68+
* safe here.
69+
*/
4270
CSV_INLINE CSVReader operator ""_csv_no_header(const char* in, size_t n) {
43-
return parse_no_header(csv::string_view(in, n));
71+
CSVFormat format;
72+
format.header_row(-1);
73+
return parse_unsafe(csv::string_view(in, n), format);
4474
}
4575

4676
/**

include/internal/csv_utility.hpp

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
#include "csv_format.hpp"
44
#include "csv_reader.hpp"
55
#include "data_type.hpp"
6+
#include "string_view_stream.hpp"
67

78
#include <string>
89
#include <type_traits>
@@ -19,12 +20,31 @@ namespace csv {
1920
};
2021

2122
/** @name Shorthand Parsing Functions
22-
* @brief Convienience functions for parsing small strings
23+
* @brief Convenience functions for parsing small strings
2324
*/
2425
///@{
2526
CSVReader operator ""_csv(const char*, size_t);
2627
CSVReader operator ""_csv_no_header(const char*, size_t);
28+
29+
/** Parse CSV from a string view, copying the input into an owned buffer.
30+
*
31+
* Safe for any string_view regardless of the caller's ownership of the
32+
* underlying memory.
33+
*/
2734
CSVReader parse(csv::string_view in, CSVFormat format = CSVFormat());
35+
36+
/** Parse CSV from an in-memory view with zero copy.
37+
*
38+
* WARNING: Non-owning path. The caller must ensure `in`'s backing memory
39+
* remains valid and immutable while the reader may request additional rows
40+
* from the source stream.
41+
*
42+
* Already materialized CSVRows remain safe because parsed chunk data is
43+
* owned by RawCSVData, so make sure you grab all the CSVRows you need
44+
* before the underlying string is destroyed.
45+
*/
46+
CSVReader parse_unsafe(csv::string_view in, CSVFormat format = CSVFormat());
47+
2848
CSVReader parse_no_header(csv::string_view in);
2949
///@}
3050

0 commit comments

Comments
 (0)