Speed Up Nearest Zone Calculation in Disaggregate Accessibilities#1031
Open
dhensle wants to merge 6 commits into
Open
Speed Up Nearest Zone Calculation in Disaggregate Accessibilities#1031dhensle wants to merge 6 commits into
dhensle wants to merge 6 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR targets the performance bottleneck in disaggregate accessibility zone lookups (Issue #1030) by optimizing (1) nearest-zone identification when using skims and (2) zone-id-to-skim-index mapping.
Changes:
- Introduces a vectorized nearest-zone skim lookup helper and switches
find_nearest_accessibility_zoneto use it. - Adds a numpy-array fast path to
OffsetMapperto speed up zone id → skim index mapping when an offset series is used.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
activitysim/core/skim_dictionary.py |
Adds a numpy-based fast path for OffsetMapper mapping to reduce pandas .map() overhead. |
activitysim/abm/tables/disaggregate_accessibility.py |
Replaces per-origin skim nearest-zone lookups with a vectorized approach intended to reduce Python-level overhead. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+76
to
+94
| # Build numpy lookup array for fast O(1) mapping | ||
| # This replaces slow pandas Series.map() with direct numpy indexing | ||
| index_vals = offset_series.index.values | ||
| if len(index_vals) > 0: | ||
| min_zone = int(index_vals.min()) | ||
| max_zone = int(index_vals.max()) | ||
| # Only build array if zone IDs are non-negative and range is reasonable | ||
| # (avoid huge arrays for sparse zone IDs) | ||
| if min_zone >= 0 and (max_zone - min_zone + 1) <= len(index_vals) * 10: | ||
| self._offset_array = np.full( | ||
| max_zone + 1, NOT_IN_SKIM_ZONE_ID, dtype=np.int32 | ||
| ) | ||
| self._offset_array[ | ||
| index_vals.astype(int) | ||
| ] = offset_series.values.astype(np.int32) | ||
| else: | ||
| self._offset_array = None | ||
| else: | ||
| self._offset_array = None |
Comment on lines
+149
to
+162
| if self._offset_array is not None: | ||
| # Fast path: use numpy array indexing (O(1) per element) | ||
| zone_ids = np.asanyarray(zone_ids).astype(int) | ||
| # Clip to valid range to avoid index errors, then mark out-of-range as NOT_IN_SKIM | ||
| max_valid = len(self._offset_array) - 1 | ||
| valid_mask = (zone_ids >= 0) & (zone_ids <= max_valid) | ||
| # Use clip to safely index, then apply mask | ||
| clipped_ids = np.clip(zone_ids, 0, max_valid) | ||
| offsets = np.where( | ||
| valid_mask, | ||
| self._offset_array[clipped_ids], | ||
| NOT_IN_SKIM_ZONE_ID, | ||
| ) | ||
| return offsets |
Comment on lines
+35
to
+58
| origin_zones = np.asarray(origin_zones) | ||
| dest_zones = np.asarray(dest_zones) | ||
| n_origins = len(origin_zones) | ||
| n_dests = len(dest_zones) | ||
|
|
||
| # handle empty input case | ||
| if n_origins == 0 or n_dests == 0: | ||
| return [] | ||
|
|
||
| # Create all origin-destination pairs in one go | ||
| # all_orig: [o1, o1, o1, ..., o2, o2, o2, ..., oN, oN, oN, ...] | ||
| # all_dest: [d1, d2, d3, ..., d1, d2, d3, ..., d1, d2, d3, ...] | ||
| all_orig = np.repeat(origin_zones, n_dests) | ||
| all_dest = np.tile(dest_zones, n_origins) | ||
|
|
||
| # Single skim lookup for all pairs | ||
| all_dists = skim_dict.lookup(all_orig, all_dest, "DIST") | ||
|
|
||
| # Reshape to (n_origins, n_dests) and find argmin per origin | ||
| dist_matrix = np.asarray(all_dists).reshape(n_origins, n_dests) | ||
| nearest_indices = np.argmin(dist_matrix, axis=1) | ||
|
|
||
| # Return list of (origin, nearest_dest) tuples | ||
| return list(zip(origin_zones, dest_zones[nearest_indices])) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix for #1030
This pull request introduces significant performance improvements to zone lookup operations in the accessibility calculations by vectorizing nearest zone searches and optimizing the mapping of zone IDs to skim indices. The changes focus on reducing redundant operations and leveraging efficient numpy-based lookups, which should result in faster computations, especially for large datasets.
Performance improvements in zone lookup and mapping:
find_nearest_zones_via_skims, enabling a single batched skim lookup for all origin-destination pairs instead of one lookup per origin zone. This reduces computational overhead indisaggregate_accessibility.py. [1] [2]Optimizations in
OffsetMapperfor skim index mapping:_offset_array) toOffsetMapperfor O(1) zone ID to index mapping when the offset is specified as a pandas Series, replacing the slower pandas.map()approach. This is constructed only when zone IDs are non-negative and the range is reasonable. [1] [2]mapmethod to use this numpy array when available, including safe handling of out-of-range indices, which improves performance for large zone lists.