vectorized d1s#102
Conversation
|
three plausible shapes, each with a different tradeoff: (a) Polymorphic index: keep one function, accept int | Sequence[int]. Return Tally for scalar, list[Tally] for sequence. Cleanest API surface, but the return type flips (b) Add indices kwarg: keep index as-is, add a new indices=None parameter. If supplied, vectorized path runs and returns list[Tally] (or arrays). Lower review risk than (c) Two functions (what I did): clearest discoverability, can return ndarray from the series path without violating any existing contract — but more API surface. this pr does (c) |
891f85c to
b9816fd
Compare
Benchmark: vectorized
|
Rework the sum_nuclides=True path of apply_time_correction so the TCF-weighted sum over the parent-nuclide axis is evaluated as a contraction (np.einsum) rather than a broadcast-multiply-and-reduce per index. The shared 5-D tally views are reshaped once. For a summed (derived) tally the public sum/sum_sq accessors return None regardless of the stored arrays, so the derived tally's sum/sum_sq are left unset rather than recomputed each call: this matches develop's observable behavior, skips two full-array multiplies per index, and avoids storing arrays shaped inconsistently with the popped ParentNuclideFilter (which break Tally.sparse). For a mesh tally (27k bins x 108 radionuclides x ~200 times) this is ~9x faster than the per-index implementation, with mean/std_dev agreeing to ~1e-15 relative. The factor matrix is shaped (n_indices, n_radionuclides) so each index's row is contiguous, keeping a scalar call bit-for-bit identical to the matching slice of a multi-index call. Update the docstring/comments and extend the multi-index unit test.
b9816fd to
b788d20
Compare
Description
Adds
apply_time_correction_seriestoopenmc.deplete.d1s, a vectorized variant ofapply_time_correctionthat evaluates many time indices in a single matrix multiplication.Calling
apply_time_correctionin a loop over N time indices deep-copies the tally and re-multiplies itssum/sum_sq/mean/std_devarrays N times. The new function builds an(N, n_radionuclides)factor matrix and folds the radionuclide-axis sum into a single matmul, so all indices are evaluated in one pass.Returns NumPy arrays rather than a list of derived
Tallyobjects: constructing N derived tallies (each with its own copy of_sum/_sum_sq/_mean/_std_dev) defeats the memory advantage on fine-mesh tallies. Users who need aTallyper index can build one from the returned arrays.Motivation: shutdown dose-rate analysis routinely needs a full dose-vs-time curve, which means evaluating the same
time_correction_factorsdict at every index in the cooling schedule. For a 90-timestep schedule on a ~10⁶-voxel mesh the loop spends most of its time in repeated copy + elementwise multiply; the matmul-based path is ~5–15× faster on typical workloads (matmul hits BLAS, the per-iterationcopy(tally)is gone, and_sum/_sum_sqare no longer materialized for results that are typically read-only).Fixes # (issue) — N/A
Checklist