Skip to content

[Feature] Feat/loss module helpers#3901

Merged
vmoens merged 7 commits into
pytorch:mainfrom
theap06:feat/loss-module-helpers
Jun 24, 2026
Merged

[Feature] Feat/loss module helpers#3901
vmoens merged 7 commits into
pytorch:mainfrom
theap06:feat/loss-module-helpers

Conversation

@theap06

@theap06 theap06 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds reusable LossModule helpers for boilerplate that appears across several loss modules, and migrates a small set of callsites to use them.

New helpers

  • LossModule.register_coeff_buffer(...)
    • registers scalar coefficient buffers with consistent tensor conversion, device/dtype handling, optional-None behavior, and scalar validation.
    • replaces repeated torch.tensor(...) / register_buffer(...) / None handling for coefficient-like fields.
  • A default LossModule._forward_value_estimator_keys(...)
    • forwards common value-estimator keys when they are present on tensor_keys.
    • refreshes in_keys via _set_in_keys() when available.

Migrated callsites

  • A2CLoss
    • entropy_coeff, critic_coeff, and clip_value now use register_coeff_buffer.
  • PPOLoss
    • scalar entropy_coeff, optional critic_coeff, and clip_value now use register_coeff_buffer.
    • per-head entropy coefficient mappings keep their existing mapping behavior while using the helper for the compatibility scalar buffer.
  • IQLLoss
    • relies on the default _forward_value_estimator_keys implementation instead of carrying an identical local override.

Losses with non-standard value-estimator key remapping, estimator-specific keys, or other custom behavior keep their explicit overrides.

Tests

  • Added regression coverage for register_coeff_buffer, including None, dtype preservation, non-scalar rejection, and bool rejection.
  • Added regression coverage for the default _forward_value_estimator_keys behavior and set_keys propagation.

theap06 and others added 2 commits June 21, 2026 00:10
Add LossModule._prepare_value_estimator_kwargs() to eliminate the 9-11
line boilerplate that every make_value_estimator override was repeating:
defaulting value_type, delegating the ValueEstimatorBase instance/class
path to the base handler, and building the hp dict from
default_value_kwargs merged with self.gamma and caller overrides.

Refactored losses: A2CLoss, PPOLoss, SACLoss, DiscreteSACLoss, DDPGLoss,
TD3Loss (~55 lines removed). The remaining 11 losses follow the identical
pattern and can be migrated in a follow-up.

Adds 30 regression tests in TestPrepareValueEstimatorKwargs covering the
helper in isolation and enum dispatch for all supported value types across
each refactored loss.

Also adds .envrc and notebooks/ to .gitignore.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…_value_estimator_keys

Follow-up to the make_value_estimator preamble extraction. Adds two more
LossModule helpers that remove copy-paste from new loss modules:

- register_coeff_buffer(name, value, *, device, dtype): converts a scalar (or
  tensor) coefficient to a tensor and registers it as a buffer, with None
  setting the attribute to None instead (the common optional-coefficient idiom).
  Replaces the repeated isinstance/torch.tensor/register_buffer block.
- A default _forward_value_estimator_keys that forwards the six universally
  accepted value-estimator keys (advantage, value_target, value, reward, done,
  terminated) present on the loss's tensor_keys, then calls _set_in_keys when
  defined. Losses that remap key names (value -> state_action_value /
  global_value) or forward estimator-specific keys (sample_log_prob) keep their
  own override.

Migrated: A2CLoss (coeff buffers), SACLoss and IQLLoss (drop bespoke
_forward_value_estimator_keys). The remaining losses follow identical patterns
and can be migrated in a follow-up; losses whose _AcceptedKeys include extra
value keys (e.g. REDQ's sample_log_prob) need their forwarding verified first.

Adds TestRegisterCoeffBuffer and TestDefaultForwardValueEstimatorKeys to
test/objectives/test_loss_module.py (12 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pytorch-bot

pytorch-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3901

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 22, 2026
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results: PR 9161e0cb vs main 120ffde9

Benchmark run: https://github.com/pytorch/rl/actions/runs/28117672898

Higher ops/sec is better. Tables are sorted by largest absolute change.

CPU

Compared 192 benchmarks. Regressions over 5%: 3. Improvements over 5%: 22.

Benchmark main ops PR ops Change
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 430.59 2,050 +376.07%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] 54.06 84.84 +56.93%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 2,573 3,628 +40.99%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 2,612 3,571 +36.71%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,484 3,353 +35.02%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,504 3,364 +34.36%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,527 3,189 +26.20%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1,896 2,298 +21.18%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,796 3,354 +19.94%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1,849 2,204 +19.21%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3,035 3,612 +19.01%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-same] 23.99 28.35 +18.18%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 2,629 3,087 +17.42%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 687.79 785.31 +14.18%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 727.20 829.64 +14.09%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 33.98 29.24 -13.94%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-same] 22.52 19.90 -11.65%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] 248.31 275.40 +10.91%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 491.46 537.49 +9.37%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] 228.71 248.19 +8.52%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] 258.85 278.48 +7.59%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] 21,502 23,128 +7.56%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 51.16 54.79 +7.08%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,295 2,133 -7.07%
benchmarks/test_envs_benchmark.py::test_simple 1.7003 1.7977 +5.73%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] 24,360 23,161 -4.92%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 637.95 666.44 +4.47%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-True] 19,874 20,750 +4.41%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape1-atari] 629.46 655.72 +4.17%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 46.60 48.52 +4.11%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] 19,073 19,850 +4.07%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 517.69 498.15 -3.77%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape2-large_img] 395.39 409.97 +3.69%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1,051 1,090 +3.67%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[numpy] 359,382 372,516 +3.65%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-False] 30,642 31,760 +3.65%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] 20,334 21,066 +3.60%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[generalized_advantage_estimate-False-1-512] 106.52 110.35 +3.60%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-gru] 3.0190 3.1259 +3.54%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-True] 1.3598 1.3123 -3.50%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 739.92 765.43 +3.45%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-None] 82.21 84.89 +3.26%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] 5,241 5,072 -3.23%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 25.33 24.51 -3.21%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 0.6026 0.5833 -3.21%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-False] 43,703 42,420 -2.94%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] 173.08 168.12 -2.87%
benchmarks/test_objectives_benchmarks.py::test_values[td1_return_estimate-False-False] 35.68 36.71 +2.87%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-backward] 63.14 61.36 -2.82%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[False-backward] 56.39 54.83 -2.78%
benchmarks/test_envs_benchmark.py::test_transformed 0.8868 0.9101 +2.62%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-True] 17,695 18,150 +2.57%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] 681.95 699.45 +2.57%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-backward] 122.67 125.78 +2.54%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-False] 37,649 38,577 +2.47%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[untyped_storage] 7.6489 7.8350 +2.43%
benchmarks/test_objectives_benchmarks.py::test_values[td0_return_estimate-False-False] 7,670 7,854 +2.41%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-False] 26,794 27,438 +2.40%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] 36,942 37,810 +2.35%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] 21,643 22,151 +2.35%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[100-img_shape0-atari] 25.62 26.21 +2.31%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[False-None] 175.10 178.96 +2.21%
benchmarks/test_objectives_benchmarks.py::test_values[td_lambda_return_estimate-True-False] 24.09 24.62 +2.19%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[reduce-overhead-None] 274.00 279.90 +2.16%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-True] 36,959 37,745 +2.13%
benchmarks/test_envs_benchmark.py::test_serial 0.5704 0.5823 +2.09%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-True] 18,214 18,593 +2.08%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] 4,396 4,486 +2.05%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] 76,088 77,608 +2.00%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] 3,441 3,509 +1.99%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[100-img_shape0-atari] 29.31 29.89 +1.98%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-True] 19,161 19,537 +1.97%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] 281.85 287.38 +1.96%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[200-img_shape1-large_batch] 13.10 13.36 +1.94%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[True-None] 227.64 223.36 -1.88%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] 56,128 57,176 +1.87%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-False] 43,994 44,809 +1.85%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-None] 689.46 702.21 +1.85%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-True] 29,532 30,060 +1.79%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-None] 348.98 342.87 -1.75%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-backward] 244.88 240.60 -1.75%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[200-img_shape3-large_batch] 328.83 334.52 +1.73%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] 56.93 57.91 +1.72%
benchmarks/test_objectives_benchmarks.py::test_values[generalized_advantage_estimate-True-True] 95.45 97.08 +1.71%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape2-large_img] 421.59 428.72 +1.69%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] 14.85 15.09 +1.65%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 0.5266 0.5181 -1.61%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 195.80 198.93 +1.60%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 0.2228 0.2264 +1.59%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] 30,045 30,522 +1.59%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 193.75 196.80 +1.58%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 495.60 487.79 -1.57%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[torch.save] 7,226 7,112 -1.57%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-None] 335.08 329.85 -1.56%
benchmarks/test_collectors_benchmark.py::test_single_with_rb 8.5892 8.7218 +1.54%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-True] 32,250 32,748 +1.54%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[False-backward] 89.63 90.96 +1.49%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[reduce-overhead-None] 475.64 482.67 +1.48%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-backward] 131.85 129.91 -1.47%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-False] 53,908 54,652 +1.38%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-None] 88.96 87.75 -1.37%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[False-None] 120.09 121.72 +1.35%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-None] 120.74 122.31 +1.30%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-backward] 77.81 76.80 -1.29%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-False] 62,712 63,508 +1.27%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 0.5842 0.5912 +1.19%
benchmarks/test_collectors_benchmark.py::test_sync 16.50 16.70 +1.17%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-False] 34,252 33,859 -1.15%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[False-None] 49.09 49.65 +1.14%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-lstm] 3.1154 3.1506 +1.13%
benchmarks/test_collectors_benchmark.py::test_single 8.8213 8.9170 +1.09%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-False-0-lstm] 0.8560 0.8470 -1.06%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape1-atari] 273.15 275.95 +1.02%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[reduce-overhead-None] 223.20 225.45 +1.01%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 24.68 24.43 -1.01%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-None] 116.73 117.83 +0.95%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] 49,417 49,882 +0.94%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 558.95 564.19 +0.94%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[200-img_shape3-large_batch] 308.83 311.68 +0.92%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 0.2103 0.2123 +0.92%
... ... ... Showing 120 of 192 comparisons, sorted by absolute change.

GPU

Compared 202 benchmarks. Regressions over 5%: 13. Improvements over 5%: 7.

Benchmark main ops PR ops Change
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 50.38 490.50 +873.51%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 196.49 37.58 -80.88%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[reduce-overhead-None] 57.77 89.41 +54.77%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,656 2,447 -33.07%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 937.57 702.69 -25.05%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3,416 2,654 -22.33%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 915.36 738.56 -19.32%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[reduce-overhead-None] 106.51 86.19 -19.08%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 868.83 704.66 -18.90%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1,862 2,210 +18.69%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 415.91 493.48 +18.65%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 848.71 719.89 -15.18%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3,309 2,994 -9.52%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[reduce-overhead-None] 794.90 861.97 +8.44%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] 787.46 845.13 +7.32%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape2-large_img] 410.04 380.17 -7.29%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape2-large_img] 413.14 389.14 -5.81%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] 1,863 1,966 +5.51%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] 399.72 378.60 -5.28%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] 478.37 454.17 -5.06%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[200-img_shape3-large_batch] 733.09 697.69 -4.83%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 2,604 2,728 +4.80%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1,293 1,233 -4.61%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-backward] 245.32 234.30 -4.49%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,489 2,600 +4.45%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-True] 1.2995 1.3560 +4.34%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] 4,246 4,064 -4.29%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape2-large_img] 551.73 528.38 -4.23%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 1,397 1,338 -4.22%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] 173.33 166.11 -4.17%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-True] 17,743 18,479 +4.15%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] 973.28 1,012 +3.99%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[False-backward] 155.57 149.49 -3.91%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] 3,555 3,422 -3.72%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[1000000-cuda] 2,345 2,258 -3.71%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] 685.68 660.33 -3.70%
benchmarks/test_collectors_benchmark.py::test_sync 10.16 10.53 +3.67%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[False-None] 647.14 624.10 -3.56%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-False] 48,388 50,098 +3.53%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-lstm] 22.17 21.41 -3.43%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 0.6052 0.5851 -3.31%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1,359 1,314 -3.31%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] 137.97 133.64 -3.14%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-True] 27,707 28,574 +3.13%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[reduce-overhead-None] 1,881 1,940 +3.12%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-None] 758.49 781.33 +3.01%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 22.17 21.52 -2.94%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] 75,418 77,632 +2.94%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-True] 19,224 19,781 +2.90%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[reduce-overhead-None] 43.31 44.49 +2.72%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[torch.save] 6,977 7,162 +2.66%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-False] 41,876 42,970 +2.61%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] 726.83 745.43 +2.56%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 23.22 22.64 -2.48%
benchmarks/test_envs_benchmark.py::test_serial 0.4227 0.4328 +2.39%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 53.55 52.30 -2.33%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-True] 32,190 32,936 +2.32%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] 4,164 4,259 +2.28%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-False] 33,880 34,645 +2.26%
benchmarks/test_objectives_benchmarks.py::test_values[td0_return_estimate-False-False] 12,098 11,828 -2.23%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] 80.89 79.10 -2.22%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] 29,514 30,162 +2.19%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape1-atari] 630.86 617.07 -2.19%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[pickle] 11,946 12,207 +2.18%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sample_mixed_devices[1000000-cuda_storage_cpu_sampler] 87.77 89.63 +2.12%
benchmarks/test_envs_benchmark.py::test_simple 1.2439 1.2177 -2.10%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-False] 27,169 27,738 +2.10%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-gru] 23.37 22.88 -2.10%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-True] 37,417 38,149 +1.96%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3,600 3,671 +1.95%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] 55,995 57,087 +1.95%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 485.95 495.42 +1.95%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] 34,723 34,087 -1.83%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-backward] 348.38 354.71 +1.82%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-None] 370.22 376.82 +1.78%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[200-img_shape1-large_batch] 8.7318 8.5801 -1.74%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,115 2,079 -1.73%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-False] 34,798 34,198 -1.72%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] 222.68 218.85 -1.72%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] 41,809 42,521 +1.70%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 160.18 162.89 +1.69%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] 28,495 28,964 +1.64%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 0.5112 0.5196 +1.64%
benchmarks/test_objectives_benchmarks.py::test_values[vec_generalized_advantage_estimate-True-True] 295.26 300.04 +1.62%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-None] 507.92 516.12 +1.61%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb_cuda[200-img_shape1-large_batch] 8.6198 8.4824 -1.59%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-constant] 4,984 4,904 -1.59%
benchmarks/test_objectives_benchmarks.py::test_values[td_lambda_return_estimate-True-False] 12.77 12.56 -1.59%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[100-img_shape0-atari] 17.48 17.21 -1.55%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] 19,359 19,652 +1.51%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-None] 684.14 694.37 +1.50%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[200-img_shape1-large_batch] 13.49 13.29 -1.46%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1,969 1,997 +1.45%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sample_mixed_devices[1000000-memmap_cpu_storage_cud... 994.08 979.76 -1.44%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-same] 7.0274 6.9262 -1.44%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-False] 63,566 64,438 +1.37%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb_cuda[100-img_shape0-atari] 17.05 16.82 -1.35%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-False] 38,190 38,703 +1.34%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 0.5910 0.5831 -1.34%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-None] 346.21 341.63 -1.32%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 0.5191 0.5125 -1.28%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] 49,426 48,800 -1.27%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sample_mixed_devices[1000000-cuda_storage_cuda_samp... 1,538 1,519 -1.24%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-False] 31,898 32,291 +1.23%
benchmarks/test_collectors_benchmark.py::test_async_pixels 10.71 10.84 +1.20%
benchmarks/test_envs_benchmark.py::test_parallel 0.5386 0.5322 -1.19%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] 23,784 23,502 -1.19%
benchmarks/test_collectors_benchmark.py::test_async 11.03 10.90 -1.18%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-None] 743.09 751.75 +1.17%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[reduce-overhead-None] 105.66 106.89 +1.16%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-None] 234.57 231.86 -1.16%
benchmarks/test_collectors_benchmark.py::test_sync_preempt 10.55 10.43 -1.13%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-backward] 41.22 40.77 -1.11%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] 316.42 319.87 +1.09%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] 23,663 23,409 -1.07%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[generalized_advantage_estimate-False-1-512] 50.94 50.42 -1.01%
benchmarks/test_objectives_benchmarks.py::test_values[vec_td_lambda_return_estimate-True-False] 878.58 869.94 -0.98%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[100-img_shape0-atari] 30.66 30.36 -0.95%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] 15.27 15.12 -0.94%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[reduce-overhead-None] 833.26 840.83 +0.91%
... ... ... Showing 120 of 202 comparisons, sorted by absolute change.

@vmoens vmoens left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@vmoens vmoens merged commit 17f0787 into pytorch:main Jun 24, 2026
67 of 84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Feature New feature Objectives

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants