[Feature] OfflineToOnlineTrainer + sota script for offline→online RL#3904
Open
theap06 wants to merge 3 commits into
Open
[Feature] OfflineToOnlineTrainer + sota script for offline→online RL#3904theap06 wants to merge 3 commits into
theap06 wants to merge 3 commits into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3904
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ No Failures, 55 PendingAs of commit 377437b with merge base f7ba109 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
b9ba04f to
2893f5b
Compare
Follow-up to the OfflineToOnlineReplayBuffer PR: a SAC trainer that drives the offline-pretrain -> online-finetune transition, plus a standalone sota-implementations script. - OfflineToOnlineTrainer (subclasses SACTrainer): routes collected experience to the online buffer (pre_epoch), samples a mixed offline/online batch (process_optim_batch), and anneals the offline fraction to zero over anneal_frames (post_steps). Backed by two reusable hooks: OfflineToOnlineReplayBufferHook (projects online transitions onto the offline dataset schema so the mixed-batch concat stays valid) and OfflineToOnlineAnnealHook. - sota-implementations/offline_to_online/train.py: a self-contained SAC offline->online script (offline dataset via d4rl:/minari: string). - Tests: hook + flow tests and a gated functional train() run on Pendulum. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
07dbcac to
c50a082
Compare
Contributor
Benchmark Results: PR
|
| Benchmark | main ops | PR ops | Change |
|---|---|---|---|
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] |
60.11 | 532.27 | +785.43% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] |
195.60 | 40.50 | -79.29% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] |
54.56 | 90.14 | +65.21% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] |
53.49 | 31.82 | -40.51% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] |
2,464 | 3,400 | +38.00% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] |
3,474 | 2,655 | -23.57% |
benchmarks/test_objectives_benchmarks.py::test_values[vec_generalized_advantage_estimate-True-True] |
55.12 | 66.83 | +21.23% |
benchmarks/test_objectives_benchmarks.py::test_values[vec_td1_return_estimate-False-False] |
54.80 | 65.82 | +20.12% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-same] |
24.38 | 29.17 | +19.64% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] |
3,682 | 2,963 | -19.52% |
benchmarks/test_objectives_benchmarks.py::test_values[vec_td_lambda_return_estimate-True-False] |
54.74 | 64.81 | +18.40% |
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-32-512] |
29.02 | 34.11 | +17.57% |
benchmarks/test_collectors_benchmark.py::test_sync_preempt |
14.21 | 16.68 | +17.36% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] |
2,908 | 3,410 | +17.24% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] |
2,830 | 3,173 | +12.12% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-same] |
23.01 | 20.24 | -12.05% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] |
3,554 | 3,126 | -12.03% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] |
3,149 | 2,771 | -12.01% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] |
104.31 | 114.40 | +9.67% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] |
4,874 | 5,343 | +9.63% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] |
2,829 | 3,096 | +9.40% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] |
258.07 | 282.31 | +9.40% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] |
2,349 | 2,152 | -8.40% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-1] |
266.86 | 286.94 | +7.52% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] |
553.95 | 515.08 | -7.02% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] |
236.10 | 252.26 | +6.85% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[pickle] |
11,548 | 12,326 | +6.73% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-constant] |
4,098 | 4,358 | +6.35% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] |
394.45 | 417.49 | +5.84% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-4] |
160.73 | 169.79 | +5.64% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[untyped_storage] |
8.2613 | 8.7248 | +5.61% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] |
2,019 | 2,131 | +5.55% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] |
1,940 | 2,044 | +5.34% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] |
183.97 | 193.16 | +4.99% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] |
143.46 | 136.38 | -4.94% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[reduce-overhead-None] |
460.26 | 481.59 | +4.63% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[reduce-overhead-None] |
292.31 | 279.64 | -4.34% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-False] |
33,488 | 34,877 | +4.15% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[numpy] |
365,056 | 379,945 | +4.08% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] |
497.04 | 516.78 | +3.97% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-1] |
608.35 | 632.10 | +3.90% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-64] |
6.9256 | 7.1892 | +3.81% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-True-0-lstm] |
0.9567 | 0.9213 | -3.70% |
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-backward] |
117.49 | 121.52 | +3.43% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] |
964.75 | 996.67 | +3.31% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape2-large_img] |
573.69 | 555.49 | -3.17% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] |
41,884 | 43,208 | +3.16% |
benchmarks/test_objectives_benchmarks.py::test_redq_speed[reduce-overhead-None] |
225.03 | 232.10 | +3.14% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] |
23,846 | 24,584 | +3.10% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-False] |
34,562 | 35,622 | +3.07% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] |
0.6071 | 0.5890 | -2.98% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-True] |
1.3606 | 1.3204 | -2.95% |
benchmarks/test_objectives_benchmarks.py::test_values[td_lambda_return_estimate-True-False] |
24.22 | 24.93 | +2.94% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[50-img_shape0-small] |
7,283 | 7,072 | -2.89% |
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] |
288.69 | 296.96 | +2.87% |
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] |
57.02 | 58.65 | +2.86% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] |
177.81 | 172.82 | -2.81% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] |
882.27 | 906.95 | +2.80% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-4] |
70.53 | 72.48 | +2.78% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] |
29,964 | 30,794 | +2.77% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] |
75,555 | 77,643 | +2.76% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] |
20,201 | 19,655 | -2.70% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-backward] |
76.24 | 78.24 | +2.62% |
benchmarks/test_envs_benchmark.py::test_transformed |
0.8898 | 0.9130 | +2.60% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] |
23,624 | 23,027 | -2.52% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] |
56,830 | 58,248 | +2.50% |
benchmarks/test_envs_benchmark.py::test_parallel |
0.9740 | 0.9511 | -2.35% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] |
711.13 | 695.50 | -2.20% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] |
21,736 | 22,211 | +2.18% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] |
1,067 | 1,090 | +2.17% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-backward] |
61.59 | 62.89 | +2.11% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-16] |
17.83 | 18.20 | +2.07% |
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] |
279.39 | 285.07 | +2.03% |
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-backward] |
28.51 | 27.94 | -2.01% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] |
2,192 | 2,148 | -2.00% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-None] |
260.19 | 265.38 | +1.99% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] |
1,775 | 1,808 | +1.91% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] |
25.01 | 25.47 | +1.83% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-False] |
32,564 | 31,976 | -1.80% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] |
38,731 | 38,038 | -1.79% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-False] |
44,819 | 45,603 | +1.75% |
benchmarks/test_objectives_benchmarks.py::test_redq_speed[False-None] |
93.97 | 95.62 | +1.75% |
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-None] |
37.61 | 38.27 | +1.74% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] |
20,543 | 20,890 | +1.69% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-1] |
475.99 | 484.04 | +1.69% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-None] |
87.62 | 89.07 | +1.66% |
benchmarks/test_objectives_benchmarks.py::test_values[td1_return_estimate-False-False] |
36.02 | 36.61 | +1.66% |
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-1-512] |
665.54 | 676.55 | +1.65% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-False-0-lstm] |
0.8674 | 0.8532 | -1.64% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-False] |
27,918 | 27,465 | -1.62% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] |
50,149 | 50,925 | +1.55% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-4] |
48.21 | 48.96 | +1.55% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-gru] |
4.3456 | 4.2785 | -1.54% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] |
15.06 | 15.28 | +1.50% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-True] |
29,947 | 30,392 | +1.49% |
benchmarks/test_envs_benchmark.py::test_serial |
0.5763 | 0.5848 | +1.48% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] |
189.60 | 192.34 | +1.45% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] |
3,521 | 3,572 | +1.44% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[100-img_shape0-atari] |
25.88 | 26.25 | +1.41% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] |
160.77 | 163.02 | +1.40% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-64] |
4.5070 | 4.5691 | +1.38% |
benchmarks/test_objectives_benchmarks.py::test_cql_speed[reduce-overhead-None] |
84.11 | 85.25 | +1.36% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] |
29,562 | 29,959 | +1.34% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-4] |
148.44 | 150.40 | +1.32% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-backward] |
244.91 | 248.14 | +1.32% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-None] |
160.00 | 162.01 | +1.26% |
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-1-512] |
2,226 | 2,254 | +1.25% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-False] |
63,563 | 64,352 | +1.24% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-True] |
38,018 | 38,487 | +1.23% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[50-img_shape0-small] |
877.75 | 867.36 | -1.18% |
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-None] |
555.47 | 562.01 | +1.18% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-False] |
63,568 | 64,301 | +1.15% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] |
35,443 | 35,042 | -1.13% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] |
832.14 | 841.44 | +1.12% |
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[10000000-cpu] |
52.02 | 51.45 | -1.10% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] |
0.6843 | 0.6916 | +1.08% |
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-None] |
117.29 | 118.55 | +1.07% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-None] |
703.77 | 711.24 | +1.06% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-True] |
28,599 | 28,899 | +1.05% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-False] |
31,595 | 31,923 | +1.04% |
| ... | ... | ... | Showing 120 of 216 comparisons, sorted by absolute change. |
GPU
Compared 226 benchmarks. Regressions over 5%: 11. Improvements over 5%: 11.
| Benchmark | main ops | PR ops | Change |
|---|---|---|---|
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] |
819.45 | 2,690 | +228.32% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] |
1,928 | 857.96 | -55.51% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] |
3,622 | 2,850 | -21.33% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] |
2,199 | 1,788 | -18.68% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] |
3,088 | 3,653 | +18.32% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] |
43.83 | 50.28 | +14.72% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] |
3,784 | 4,336 | +14.58% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] |
2,647 | 3,001 | +13.37% |
benchmarks/test_collectors_benchmark.py::test_single_with_rb_pixels |
5.3509 | 4.7659 | -10.93% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] |
3,014 | 2,687 | -10.85% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] |
753.79 | 834.07 | +10.65% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] |
3,026 | 2,747 | -9.22% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] |
2,138 | 1,950 | -8.81% |
benchmarks/test_objectives_benchmarks.py::test_values[vec_generalized_advantage_estimate-True-True] |
305.15 | 279.69 | -8.34% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] |
3,354 | 3,610 | +7.62% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] |
471.02 | 503.07 | +6.80% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[untyped_storage] |
8.7684 | 8.2042 | -6.43% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] |
2,661 | 2,828 | +6.28% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-lstm] |
21.49 | 20.21 | -5.96% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-gru] |
22.71 | 21.36 | -5.94% |
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] |
369.02 | 389.04 | +5.43% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] |
898.14 | 945.07 | +5.22% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] |
351.24 | 334.31 | -4.82% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] |
22,866 | 23,947 | +4.73% |
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[1000000-cuda] |
2,226 | 2,125 | -4.55% |
benchmarks/test_envs_benchmark.py::test_parallel |
0.5264 | 0.5491 | +4.32% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape1-atari] |
634.50 | 661.51 | +4.26% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[100-img_shape0-atari] |
16.68 | 17.35 | +4.02% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] |
745.43 | 716.33 | -3.90% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-same] |
6.6073 | 6.3562 | -3.80% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] |
2,035 | 1,959 | -3.77% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[200-img_shape3-large_batch] |
301.55 | 312.62 | +3.67% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[200-img_shape3-large_batch] |
764.71 | 737.30 | -3.59% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] |
48.51 | 46.78 | -3.56% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-same] |
5.4623 | 5.2712 | -3.50% |
benchmarks/test_envs_benchmark.py::test_simple |
1.2168 | 1.1750 | -3.43% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] |
513.92 | 531.53 | +3.43% |
benchmarks/test_envs_benchmark.py::test_transformed |
0.7066 | 0.6832 | -3.31% |
benchmarks/test_objectives_benchmarks.py::test_values[td0_return_estimate-False-False] |
11,693 | 11,321 | -3.17% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-1] |
630.95 | 611.05 | -3.15% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-True] |
17,771 | 18,321 | +3.09% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] |
49,400 | 50,894 | +3.02% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-gru] |
49.78 | 48.29 | -3.00% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-True] |
20,234 | 20,828 | +2.94% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-16] |
43.19 | 44.45 | +2.92% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] |
139.35 | 135.50 | -2.77% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] |
35,271 | 34,301 | -2.75% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[100-img_shape0-atari] |
25.35 | 26.05 | +2.75% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape2-large_img] |
438.18 | 426.18 | -2.74% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] |
20,625 | 21,186 | +2.72% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[100-img_shape0-atari] |
29.36 | 30.15 | +2.69% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] |
158.24 | 162.45 | +2.66% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] |
14.80 | 15.17 | +2.51% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] |
29,996 | 30,744 | +2.49% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] |
21,639 | 22,169 | +2.45% |
benchmarks/test_objectives_benchmarks.py::test_iql_speed[reduce-overhead-None] |
108.77 | 106.12 | -2.43% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] |
77.26 | 79.13 | +2.42% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] |
706.27 | 723.36 | +2.42% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] |
446.49 | 456.95 | +2.34% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-64] |
10.81 | 11.06 | +2.31% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] |
38,151 | 39,031 | +2.30% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] |
1,015 | 1,038 | +2.26% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] |
29,331 | 29,988 | +2.24% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-None] |
228.78 | 223.67 | -2.23% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-4] |
167.02 | 163.35 | -2.19% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] |
990.26 | 1,012 | +2.18% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-True] |
32,938 | 33,656 | +2.18% |
benchmarks/test_objectives_benchmarks.py::test_values[td_lambda_return_estimate-True-False] |
12.25 | 11.98 | -2.17% |
benchmarks/test_objectives_benchmarks.py::test_values[generalized_advantage_estimate-True-True] |
48.18 | 47.15 | -2.13% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] |
0.5881 | 0.6005 | +2.11% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-True] |
37,976 | 37,174 | -2.11% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] |
19,712 | 20,121 | +2.08% |
benchmarks/test_objectives_benchmarks.py::test_values[td1_return_estimate-False-False] |
20.18 | 19.76 | -2.07% |
benchmarks/test_objectives_benchmarks.py::test_values[vec_td1_return_estimate-False-False] |
849.93 | 832.33 | -2.07% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] |
53.61 | 52.51 | -2.06% |
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-lstm] |
77.09 | 75.51 | -2.05% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] |
0.5233 | 0.5129 | -1.98% |
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-None] |
372.36 | 365.43 | -1.86% |
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[200-img_shape1-large_batch] |
8.4570 | 8.6140 | +1.86% |
benchmarks/test_envs_benchmark.py::test_serial |
0.4112 | 0.4187 | +1.83% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-False] |
34,227 | 34,843 | +1.80% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] |
186.70 | 190.03 | +1.78% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] |
0.2114 | 0.2076 | -1.76% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[reduce-overhead-None] |
811.09 | 825.40 | +1.76% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[numpy] |
376,524 | 369,915 | -1.76% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-backward] |
234.46 | 238.57 | +1.75% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-constant] |
4,723 | 4,642 | -1.72% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-False] |
37,585 | 38,228 | +1.71% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-True] |
18,724 | 19,042 | +1.70% |
benchmarks/test_objectives_benchmarks.py::test_sac_speed[reduce-overhead-None] |
104.20 | 105.94 | +1.67% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-False] |
27,271 | 27,727 | +1.67% |
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-constant] |
4,798 | 4,718 | -1.66% |
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-None] |
740.59 | 728.59 | -1.62% |
benchmarks/test_objectives_benchmarks.py::test_gae_speed[generalized_advantage_estimate-False-1-512] |
47.75 | 47.00 | -1.57% |
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] |
419.17 | 412.64 | -1.56% |
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-None] |
502.50 | 510.27 | +1.54% |
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-None] |
345.28 | 340.28 | -1.45% |
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-None] |
396.73 | 391.01 | -1.44% |
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[torch.save] |
7,162 | 7,061 | -1.40% |
benchmarks/test_collectors_benchmark.py::test_sync |
10.33 | 10.47 | +1.33% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-1] |
285.96 | 282.16 | -1.33% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] |
0.5993 | 0.6071 | +1.29% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-True] |
28,375 | 28,739 | +1.28% |
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] |
4,387 | 4,443 | +1.27% |
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-32-512] |
1,310 | 1,294 | -1.26% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] |
159.51 | 161.52 | +1.26% |
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] |
798.55 | 808.53 | +1.25% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[False-None] |
640.85 | 632.84 | -1.25% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] |
22.82 | 22.54 | -1.23% |
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] |
1,925 | 1,901 | -1.23% |
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-backward] |
271.41 | 268.09 | -1.22% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] |
58,535 | 57,839 | -1.19% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-1] |
475.18 | 469.66 | -1.16% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] |
163.29 | 165.19 | +1.16% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] |
23,091 | 23,348 | +1.11% |
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] |
52.80 | 52.23 | -1.07% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-64] |
12.64 | 12.78 | +1.04% |
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-buffers-False] |
0.5919 | 0.5978 | +1.00% |
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-480-640-16] |
4.9672 | 4.9176 | -1.00% |
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-False] |
31,983 | 32,288 | +0.95% |
| ... | ... | ... | Showing 120 of 226 comparisons, sorted by absolute change. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an offline-to-online SAC trainer and a runnable SOTA example for the offline-pretrain to online-finetune workflow.
This builds on the
OfflineToOnlineReplayBufferand dataset-loading helpers from #3900.What's added
OfflineToOnlineTrainer: aSACTrainersubclass that usesOfflineToOnlineReplayBufferfor mixed offline/online optimization batches.OfflineToOnlineReplayBufferHookstores collected experience in the online buffer and samples mixed optimization batches.OfflineToOnlineAnnealHookdecays the offline sampling fraction over collected frames.OfflineToOnlineTrainerConfig, including parity with the trainer constructor and registration in the config store.sota-implementations/offline_to_online/train.py: standalone SAC offline-to-online training script using registered dataset strings such asd4rl:andminari:.Docs and tests
OfflineToOnlineTrainerandOfflineToOnlineTrainerConfig.test/test_offline_to_online.pywith hook, trainer wiring, config, and checkpoint coverage.