Skip to content

[Feature] OfflineToOnlineTrainer + sota script for offline→online RL#3904

Open
theap06 wants to merge 3 commits into
pytorch:mainfrom
theap06:feat/offline-to-online-trainer
Open

[Feature] OfflineToOnlineTrainer + sota script for offline→online RL#3904
theap06 wants to merge 3 commits into
pytorch:mainfrom
theap06:feat/offline-to-online-trainer

Conversation

@theap06

@theap06 theap06 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an offline-to-online SAC trainer and a runnable SOTA example for the offline-pretrain to online-finetune workflow.

This builds on the OfflineToOnlineReplayBuffer and dataset-loading helpers from #3900.

What's added

  • OfflineToOnlineTrainer: a SACTrainer subclass that uses OfflineToOnlineReplayBuffer for mixed offline/online optimization batches.
  • Trainer hooks:
    • OfflineToOnlineReplayBufferHook stores collected experience in the online buffer and samples mixed optimization batches.
    • OfflineToOnlineAnnealHook decays the offline sampling fraction over collected frames.
  • Hydra config support through OfflineToOnlineTrainerConfig, including parity with the trainer constructor and registration in the config store.
  • Checkpoint support for the online buffer and current/base offline sampling fractions.
  • sota-implementations/offline_to_online/train.py: standalone SAC offline-to-online training script using registered dataset strings such as d4rl: and minari:.

Docs and tests

  • Adds reference entries for OfflineToOnlineTrainer and OfflineToOnlineTrainerConfig.
  • Extends test/test_offline_to_online.py with hook, trainer wiring, config, and checkpoint coverage.

@pytorch-bot

pytorch-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3904

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

⏳ No Failures, 55 Pending

As of commit 377437b with merge base f7ba109 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 23, 2026
@github-actions github-actions Bot added Data Data-related PR, will launch data-related jobs sota-implementations/ ReplayBuffers Trainers Feature New feature labels Jun 23, 2026
@theap06 theap06 marked this pull request as draft June 23, 2026 04:49
@theap06 theap06 marked this pull request as ready for review June 23, 2026 05:24
@theap06 theap06 force-pushed the feat/offline-to-online-trainer branch from b9ba04f to 2893f5b Compare June 23, 2026 05:58
@theap06 theap06 changed the title [Feature] OfflineToOnlineTrainer + sota-implementation for offline→online RL [Feature] OfflineToOnlineTrainer + sota script for offline→online RL Jun 23, 2026
@github-actions github-actions Bot added the Documentation Improvements or additions to documentation label Jun 23, 2026
theap06 and others added 2 commits June 23, 2026 13:36
Follow-up to the OfflineToOnlineReplayBuffer PR: a SAC trainer that drives the
offline-pretrain -> online-finetune transition, plus a standalone
sota-implementations script.

- OfflineToOnlineTrainer (subclasses SACTrainer): routes collected experience
  to the online buffer (pre_epoch), samples a mixed offline/online batch
  (process_optim_batch), and anneals the offline fraction to zero over
  anneal_frames (post_steps). Backed by two reusable hooks:
  OfflineToOnlineReplayBufferHook (projects online transitions onto the offline
  dataset schema so the mixed-batch concat stays valid) and
  OfflineToOnlineAnnealHook.
- sota-implementations/offline_to_online/train.py: a self-contained SAC
  offline->online script (offline dataset via d4rl:/minari: string).
- Tests: hook + flow tests and a gated functional train() run on Pendulum.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vmoens vmoens force-pushed the feat/offline-to-online-trainer branch from 07dbcac to c50a082 Compare June 23, 2026 20:37
@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results: PR c50a0825 vs main f7ba1092

Benchmark run: https://github.com/pytorch/rl/actions/runs/28055413278

Higher ops/sec is better. Tables are sorted by largest absolute change.

CPU

Compared 216 benchmarks. Regressions over 5%: 9. Improvements over 5%: 24.

Benchmark main ops PR ops Change
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 60.11 532.27 +785.43%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 195.60 40.50 -79.29%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] 54.56 90.14 +65.21%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 53.49 31.82 -40.51%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,464 3,400 +38.00%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3,474 2,655 -23.57%
benchmarks/test_objectives_benchmarks.py::test_values[vec_generalized_advantage_estimate-True-True] 55.12 66.83 +21.23%
benchmarks/test_objectives_benchmarks.py::test_values[vec_td1_return_estimate-False-False] 54.80 65.82 +20.12%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-same] 24.38 29.17 +19.64%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,682 2,963 -19.52%
benchmarks/test_objectives_benchmarks.py::test_values[vec_td_lambda_return_estimate-True-False] 54.74 64.81 +18.40%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 29.02 34.11 +17.57%
benchmarks/test_collectors_benchmark.py::test_sync_preempt 14.21 16.68 +17.36%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 2,908 3,410 +17.24%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 2,830 3,173 +12.12%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-same] 23.01 20.24 -12.05%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3,554 3,126 -12.03%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3,149 2,771 -12.01%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] 104.31 114.40 +9.67%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] 4,874 5,343 +9.63%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,829 3,096 +9.40%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] 258.07 282.31 +9.40%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,349 2,152 -8.40%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-1] 266.86 286.94 +7.52%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 553.95 515.08 -7.02%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] 236.10 252.26 +6.85%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[pickle] 11,548 12,326 +6.73%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-constant] 4,098 4,358 +6.35%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] 394.45 417.49 +5.84%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-4] 160.73 169.79 +5.64%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[untyped_storage] 8.2613 8.7248 +5.61%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,019 2,131 +5.55%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1,940 2,044 +5.34%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] 183.97 193.16 +4.99%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] 143.46 136.38 -4.94%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[reduce-overhead-None] 460.26 481.59 +4.63%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[reduce-overhead-None] 292.31 279.64 -4.34%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-False] 33,488 34,877 +4.15%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[numpy] 365,056 379,945 +4.08%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 497.04 516.78 +3.97%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-1] 608.35 632.10 +3.90%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-64] 6.9256 7.1892 +3.81%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-True-0-lstm] 0.9567 0.9213 -3.70%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-backward] 117.49 121.52 +3.43%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] 964.75 996.67 +3.31%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape2-large_img] 573.69 555.49 -3.17%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] 41,884 43,208 +3.16%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[reduce-overhead-None] 225.03 232.10 +3.14%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] 23,846 24,584 +3.10%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-False] 34,562 35,622 +3.07%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 0.6071 0.5890 -2.98%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-True] 1.3606 1.3204 -2.95%
benchmarks/test_objectives_benchmarks.py::test_values[td_lambda_return_estimate-True-False] 24.22 24.93 +2.94%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[50-img_shape0-small] 7,283 7,072 -2.89%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] 288.69 296.96 +2.87%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] 57.02 58.65 +2.86%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] 177.81 172.82 -2.81%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 882.27 906.95 +2.80%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-4] 70.53 72.48 +2.78%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] 29,964 30,794 +2.77%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] 75,555 77,643 +2.76%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] 20,201 19,655 -2.70%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-backward] 76.24 78.24 +2.62%
benchmarks/test_envs_benchmark.py::test_transformed 0.8898 0.9130 +2.60%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] 23,624 23,027 -2.52%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] 56,830 58,248 +2.50%
benchmarks/test_envs_benchmark.py::test_parallel 0.9740 0.9511 -2.35%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] 711.13 695.50 -2.20%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] 21,736 22,211 +2.18%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1,067 1,090 +2.17%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-backward] 61.59 62.89 +2.11%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-16] 17.83 18.20 +2.07%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] 279.39 285.07 +2.03%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-backward] 28.51 27.94 -2.01%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2,192 2,148 -2.00%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-None] 260.19 265.38 +1.99%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] 1,775 1,808 +1.91%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 25.01 25.47 +1.83%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-False] 32,564 31,976 -1.80%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] 38,731 38,038 -1.79%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-False] 44,819 45,603 +1.75%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[False-None] 93.97 95.62 +1.75%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-None] 37.61 38.27 +1.74%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] 20,543 20,890 +1.69%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-1] 475.99 484.04 +1.69%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-None] 87.62 89.07 +1.66%
benchmarks/test_objectives_benchmarks.py::test_values[td1_return_estimate-False-False] 36.02 36.61 +1.66%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 665.54 676.55 +1.65%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-False-0-lstm] 0.8674 0.8532 -1.64%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-False] 27,918 27,465 -1.62%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] 50,149 50,925 +1.55%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-4] 48.21 48.96 +1.55%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-gru] 4.3456 4.2785 -1.54%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] 15.06 15.28 +1.50%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-True] 29,947 30,392 +1.49%
benchmarks/test_envs_benchmark.py::test_serial 0.5763 0.5848 +1.48%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 189.60 192.34 +1.45%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] 3,521 3,572 +1.44%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[100-img_shape0-atari] 25.88 26.25 +1.41%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 160.77 163.02 +1.40%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-64] 4.5070 4.5691 +1.38%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[reduce-overhead-None] 84.11 85.25 +1.36%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] 29,562 29,959 +1.34%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-4] 148.44 150.40 +1.32%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-backward] 244.91 248.14 +1.32%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-None] 160.00 162.01 +1.26%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 2,226 2,254 +1.25%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-False] 63,563 64,352 +1.24%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-True] 38,018 38,487 +1.23%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[50-img_shape0-small] 877.75 867.36 -1.18%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-None] 555.47 562.01 +1.18%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-False] 63,568 64,301 +1.15%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] 35,443 35,042 -1.13%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 832.14 841.44 +1.12%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[10000000-cpu] 52.02 51.45 -1.10%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 0.6843 0.6916 +1.08%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-None] 117.29 118.55 +1.07%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-None] 703.77 711.24 +1.06%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-True] 28,599 28,899 +1.05%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-False] 31,595 31,923 +1.04%
... ... ... Showing 120 of 216 comparisons, sorted by absolute change.

GPU

Compared 226 benchmarks. Regressions over 5%: 11. Improvements over 5%: 11.

Benchmark main ops PR ops Change
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 819.45 2,690 +228.32%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1,928 857.96 -55.51%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,622 2,850 -21.33%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2,199 1,788 -18.68%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,088 3,653 +18.32%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 43.83 50.28 +14.72%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] 3,784 4,336 +14.58%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,647 3,001 +13.37%
benchmarks/test_collectors_benchmark.py::test_single_with_rb_pixels 5.3509 4.7659 -10.93%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3,014 2,687 -10.85%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 753.79 834.07 +10.65%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3,026 2,747 -9.22%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,138 1,950 -8.81%
benchmarks/test_objectives_benchmarks.py::test_values[vec_generalized_advantage_estimate-True-True] 305.15 279.69 -8.34%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] 3,354 3,610 +7.62%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 471.02 503.07 +6.80%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[untyped_storage] 8.7684 8.2042 -6.43%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 2,661 2,828 +6.28%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-lstm] 21.49 20.21 -5.96%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-gru] 22.71 21.36 -5.94%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] 369.02 389.04 +5.43%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] 898.14 945.07 +5.22%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] 351.24 334.31 -4.82%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] 22,866 23,947 +4.73%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[1000000-cuda] 2,226 2,125 -4.55%
benchmarks/test_envs_benchmark.py::test_parallel 0.5264 0.5491 +4.32%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape1-atari] 634.50 661.51 +4.26%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[100-img_shape0-atari] 16.68 17.35 +4.02%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 745.43 716.33 -3.90%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-same] 6.6073 6.3562 -3.80%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,035 1,959 -3.77%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[200-img_shape3-large_batch] 301.55 312.62 +3.67%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[200-img_shape3-large_batch] 764.71 737.30 -3.59%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 48.51 46.78 -3.56%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-same] 5.4623 5.2712 -3.50%
benchmarks/test_envs_benchmark.py::test_simple 1.2168 1.1750 -3.43%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 513.92 531.53 +3.43%
benchmarks/test_envs_benchmark.py::test_transformed 0.7066 0.6832 -3.31%
benchmarks/test_objectives_benchmarks.py::test_values[td0_return_estimate-False-False] 11,693 11,321 -3.17%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-1] 630.95 611.05 -3.15%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-True] 17,771 18,321 +3.09%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] 49,400 50,894 +3.02%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-gru] 49.78 48.29 -3.00%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-True] 20,234 20,828 +2.94%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-16] 43.19 44.45 +2.92%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] 139.35 135.50 -2.77%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] 35,271 34,301 -2.75%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[100-img_shape0-atari] 25.35 26.05 +2.75%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape2-large_img] 438.18 426.18 -2.74%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] 20,625 21,186 +2.72%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[100-img_shape0-atari] 29.36 30.15 +2.69%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 158.24 162.45 +2.66%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] 14.80 15.17 +2.51%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] 29,996 30,744 +2.49%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] 21,639 22,169 +2.45%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[reduce-overhead-None] 108.77 106.12 -2.43%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] 77.26 79.13 +2.42%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 706.27 723.36 +2.42%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] 446.49 456.95 +2.34%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-64] 10.81 11.06 +2.31%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] 38,151 39,031 +2.30%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1,015 1,038 +2.26%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] 29,331 29,988 +2.24%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-None] 228.78 223.67 -2.23%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-4] 167.02 163.35 -2.19%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 990.26 1,012 +2.18%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-True] 32,938 33,656 +2.18%
benchmarks/test_objectives_benchmarks.py::test_values[td_lambda_return_estimate-True-False] 12.25 11.98 -2.17%
benchmarks/test_objectives_benchmarks.py::test_values[generalized_advantage_estimate-True-True] 48.18 47.15 -2.13%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 0.5881 0.6005 +2.11%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-True] 37,976 37,174 -2.11%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] 19,712 20,121 +2.08%
benchmarks/test_objectives_benchmarks.py::test_values[td1_return_estimate-False-False] 20.18 19.76 -2.07%
benchmarks/test_objectives_benchmarks.py::test_values[vec_td1_return_estimate-False-False] 849.93 832.33 -2.07%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 53.61 52.51 -2.06%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-lstm] 77.09 75.51 -2.05%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 0.5233 0.5129 -1.98%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-None] 372.36 365.43 -1.86%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[200-img_shape1-large_batch] 8.4570 8.6140 +1.86%
benchmarks/test_envs_benchmark.py::test_serial 0.4112 0.4187 +1.83%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-False] 34,227 34,843 +1.80%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] 186.70 190.03 +1.78%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 0.2114 0.2076 -1.76%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[reduce-overhead-None] 811.09 825.40 +1.76%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[numpy] 376,524 369,915 -1.76%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-backward] 234.46 238.57 +1.75%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-constant] 4,723 4,642 -1.72%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-False] 37,585 38,228 +1.71%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-True] 18,724 19,042 +1.70%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[reduce-overhead-None] 104.20 105.94 +1.67%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-False] 27,271 27,727 +1.67%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-constant] 4,798 4,718 -1.66%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-None] 740.59 728.59 -1.62%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[generalized_advantage_estimate-False-1-512] 47.75 47.00 -1.57%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] 419.17 412.64 -1.56%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-None] 502.50 510.27 +1.54%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-None] 345.28 340.28 -1.45%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-None] 396.73 391.01 -1.44%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[torch.save] 7,162 7,061 -1.40%
benchmarks/test_collectors_benchmark.py::test_sync 10.33 10.47 +1.33%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-1] 285.96 282.16 -1.33%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 0.5993 0.6071 +1.29%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-True] 28,375 28,739 +1.28%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] 4,387 4,443 +1.27%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1,310 1,294 -1.26%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 159.51 161.52 +1.26%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] 798.55 808.53 +1.25%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[False-None] 640.85 632.84 -1.25%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 22.82 22.54 -1.23%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] 1,925 1,901 -1.23%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-backward] 271.41 268.09 -1.22%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] 58,535 57,839 -1.19%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-1] 475.18 469.66 -1.16%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 163.29 165.19 +1.16%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] 23,091 23,348 +1.11%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 52.80 52.23 -1.07%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-64] 12.64 12.78 +1.04%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 0.5919 0.5978 +1.00%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-480-640-16] 4.9672 4.9176 -1.00%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-False] 31,983 32,288 +0.95%
... ... ... Showing 120 of 226 comparisons, sorted by absolute change.

@github-actions github-actions Bot added the CI Has to do with CI setup (e.g. wheels & builds, tests...) label Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Has to do with CI setup (e.g. wheels & builds, tests...) CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Data Data-related PR, will launch data-related jobs Documentation Improvements or additions to documentation Feature New feature ReplayBuffers sota-implementations/ Trainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants