Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Tutorial on exporting TorchRL models #2557

Merged
merged 3 commits into from
Nov 13, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 13, 2024

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Nov 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2557

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 10 Unrelated Failures

As of commit c3904b7 with merge base 50a35f6 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Nov 13, 2024
ghstack-source-id: c64fd48e2a12b009f0d93d671478a7097e8eb105
Pull Request resolved: #2557
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2024
@vmoens vmoens added the documentation Improvements or additions to documentation label Nov 13, 2024
Copy link

github-actions bot commented Nov 13, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}35$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.4322s 0.4298s 2.3264 Ops/s 2.2207 Ops/s $\color{#35bf28}+4.76\%$
test_transformed 0.6181s 0.6107s 1.6374 Ops/s 1.5518 Ops/s $\textbf{\color{#35bf28}+5.51\%}$
test_serial 1.3503s 1.3475s 0.7421 Ops/s 0.7265 Ops/s $\color{#35bf28}+2.16\%$
test_parallel 1.2970s 1.2797s 0.7814 Ops/s 0.7669 Ops/s $\color{#35bf28}+1.90\%$
test_step_mdp_speed[True-True-True-True-True] 0.2439ms 27.2842μs 36.6512 KOps/s 36.6307 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[True-True-True-True-False] 74.6660μs 15.5528μs 64.2970 KOps/s 63.9797 KOps/s $\color{#35bf28}+0.50\%$
test_step_mdp_speed[True-True-True-False-True] 48.6310μs 15.2641μs 65.5133 KOps/s 64.9653 KOps/s $\color{#35bf28}+0.84\%$
test_step_mdp_speed[True-True-True-False-False] 54.8920μs 8.9489μs 111.7462 KOps/s 112.7752 KOps/s $\color{#d91a1a}-0.91\%$
test_step_mdp_speed[True-True-False-True-True] 95.8280μs 29.0999μs 34.3644 KOps/s 34.7810 KOps/s $\color{#d91a1a}-1.20\%$
test_step_mdp_speed[True-True-False-True-False] 55.9950μs 17.3888μs 57.5081 KOps/s 57.9525 KOps/s $\color{#d91a1a}-0.77\%$
test_step_mdp_speed[True-True-False-False-True] 66.9450μs 17.2601μs 57.9370 KOps/s 58.4735 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[True-True-False-False-False] 51.8270μs 10.6124μs 94.2297 KOps/s 95.0369 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[True-False-True-True-True] 99.6060μs 30.7109μs 32.5617 KOps/s 32.2292 KOps/s $\color{#35bf28}+1.03\%$
test_step_mdp_speed[True-False-True-True-False] 76.9230μs 19.2016μs 52.0789 KOps/s 52.6814 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[True-False-True-False-True] 93.9690μs 16.7953μs 59.5405 KOps/s 58.7352 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[True-False-True-False-False] 0.1429ms 10.9030μs 91.7181 KOps/s 94.9234 KOps/s $\color{#d91a1a}-3.38\%$
test_step_mdp_speed[True-False-False-True-True] 70.0610μs 32.0324μs 31.2184 KOps/s 31.2289 KOps/s $\color{#d91a1a}-0.03\%$
test_step_mdp_speed[True-False-False-True-False] 47.4790μs 20.5942μs 48.5574 KOps/s 48.4667 KOps/s $\color{#35bf28}+0.19\%$
test_step_mdp_speed[True-False-False-False-True] 63.6690μs 18.5023μs 54.0475 KOps/s 53.9851 KOps/s $\color{#35bf28}+0.12\%$
test_step_mdp_speed[True-False-False-False-False] 42.6100μs 12.2377μs 81.7150 KOps/s 81.7349 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[False-True-True-True-True] 70.2710μs 30.7850μs 32.4833 KOps/s 32.5601 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-True-True-True-False] 51.3660μs 19.2036μs 52.0736 KOps/s 52.3770 KOps/s $\color{#d91a1a}-0.58\%$
test_step_mdp_speed[False-True-True-False-True] 52.3580μs 19.5933μs 51.0378 KOps/s 51.0450 KOps/s $\color{#d91a1a}-0.01\%$
test_step_mdp_speed[False-True-True-False-False] 50.9550μs 11.9465μs 83.7068 KOps/s 84.6720 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[False-True-False-True-True] 0.2647ms 32.6412μs 30.6361 KOps/s 30.9039 KOps/s $\color{#d91a1a}-0.87\%$
test_step_mdp_speed[False-True-False-True-False] 0.1049ms 20.7509μs 48.1906 KOps/s 47.8226 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[False-True-False-False-True] 2.8103ms 21.2348μs 47.0925 KOps/s 47.2558 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[False-True-False-False-False] 43.3810μs 13.6265μs 73.3862 KOps/s 74.4844 KOps/s $\color{#d91a1a}-1.47\%$
test_step_mdp_speed[False-False-True-True-True] 78.5860μs 33.7785μs 29.6046 KOps/s 29.3924 KOps/s $\color{#35bf28}+0.72\%$
test_step_mdp_speed[False-False-True-True-False] 51.9270μs 22.3395μs 44.7638 KOps/s 45.1273 KOps/s $\color{#d91a1a}-0.81\%$
test_step_mdp_speed[False-False-True-False-True] 55.2730μs 21.0468μs 47.5131 KOps/s 47.5507 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[False-False-True-False-False] 49.2520μs 13.5671μs 73.7079 KOps/s 74.9518 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[False-False-False-True-True] 76.0720μs 35.2255μs 28.3885 KOps/s 28.2723 KOps/s $\color{#35bf28}+0.41\%$
test_step_mdp_speed[False-False-False-True-False] 58.2190μs 23.7519μs 42.1019 KOps/s 42.7274 KOps/s $\color{#d91a1a}-1.46\%$
test_step_mdp_speed[False-False-False-False-True] 65.2620μs 22.4345μs 44.5742 KOps/s 44.3373 KOps/s $\color{#35bf28}+0.53\%$
test_step_mdp_speed[False-False-False-False-False] 42.8800μs 15.0062μs 66.6393 KOps/s 67.3375 KOps/s $\color{#d91a1a}-1.04\%$
test_values[generalized_advantage_estimate-True-True] 12.5658ms 9.8450ms 101.5743 Ops/s 104.3163 Ops/s $\color{#d91a1a}-2.63\%$
test_values[vec_generalized_advantage_estimate-True-True] 43.3541ms 37.5024ms 26.6650 Ops/s 28.1739 Ops/s $\textbf{\color{#d91a1a}-5.36\%}$
test_values[td0_return_estimate-False-False] 1.3767ms 0.2091ms 4.7815 KOps/s 5.4804 KOps/s $\textbf{\color{#d91a1a}-12.75\%}$
test_values[td1_return_estimate-False-False] 24.6594ms 24.2950ms 41.1607 Ops/s 40.2314 Ops/s $\color{#35bf28}+2.31\%$
test_values[vec_td1_return_estimate-False-False] 43.3613ms 37.6970ms 26.5273 Ops/s 28.1931 Ops/s $\textbf{\color{#d91a1a}-5.91\%}$
test_values[td_lambda_return_estimate-True-False] 37.8375ms 35.2320ms 28.3833 Ops/s 27.8282 Ops/s $\color{#35bf28}+1.99\%$
test_values[vec_td_lambda_return_estimate-True-False] 42.1614ms 38.1141ms 26.2370 Ops/s 27.9431 Ops/s $\textbf{\color{#d91a1a}-6.11\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.5448ms 8.3588ms 119.6341 Ops/s 119.2326 Ops/s $\color{#35bf28}+0.34\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.4888ms 1.9342ms 516.9996 Ops/s 549.1322 Ops/s $\textbf{\color{#d91a1a}-5.85\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4814ms 0.3536ms 2.8278 KOps/s 2.8094 KOps/s $\color{#35bf28}+0.65\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 51.1034ms 48.3248ms 20.6933 Ops/s 20.7004 Ops/s $\color{#d91a1a}-0.03\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.3040ms 3.0729ms 325.4208 Ops/s 323.5067 Ops/s $\color{#35bf28}+0.59\%$
test_dqn_speed[False-None] 5.9193ms 1.3578ms 736.4843 Ops/s 730.8713 Ops/s $\color{#35bf28}+0.77\%$
test_dqn_speed[False-backward] 1.9009ms 1.7962ms 556.7257 Ops/s 544.8395 Ops/s $\color{#35bf28}+2.18\%$
test_dqn_speed[True-None] 1.3131ms 0.4656ms 2.1480 KOps/s 2.0957 KOps/s $\color{#35bf28}+2.49\%$
test_dqn_speed[True-backward] 1.2198ms 0.9395ms 1.0645 KOps/s 1.1192 KOps/s $\color{#d91a1a}-4.89\%$
test_dqn_speed[reduce-overhead-None] 0.6783ms 0.4643ms 2.1536 KOps/s 2.1029 KOps/s $\color{#35bf28}+2.41\%$
test_dqn_speed[reduce-overhead-backward] 0.9157ms 0.8731ms 1.1454 KOps/s 1.1099 KOps/s $\color{#35bf28}+3.20\%$
test_ddpg_speed[False-None] 3.6513ms 2.7744ms 360.4420 Ops/s 352.8925 Ops/s $\color{#35bf28}+2.14\%$
test_ddpg_speed[False-backward] 4.3289ms 3.8957ms 256.6946 Ops/s 252.2360 Ops/s $\color{#35bf28}+1.77\%$
test_ddpg_speed[True-None] 1.3135ms 1.0165ms 983.8131 Ops/s 988.5752 Ops/s $\color{#d91a1a}-0.48\%$
test_ddpg_speed[True-backward] 2.1044ms 1.9820ms 504.5411 Ops/s 497.8959 Ops/s $\color{#35bf28}+1.33\%$
test_ddpg_speed[reduce-overhead-None] 1.2626ms 1.0328ms 968.2570 Ops/s 963.7899 Ops/s $\color{#35bf28}+0.46\%$
test_ddpg_speed[reduce-overhead-backward] 2.0756ms 1.9668ms 508.4480 Ops/s 485.7959 Ops/s $\color{#35bf28}+4.66\%$
test_sac_speed[False-None] 10.6791ms 8.5434ms 117.0488 Ops/s 117.1308 Ops/s $\color{#d91a1a}-0.07\%$
test_sac_speed[False-backward] 12.9241ms 11.4389ms 87.4212 Ops/s 84.6453 Ops/s $\color{#35bf28}+3.28\%$
test_sac_speed[True-None] 2.3522ms 1.9803ms 504.9818 Ops/s 487.1853 Ops/s $\color{#35bf28}+3.65\%$
test_sac_speed[True-backward] 4.1489ms 3.5990ms 277.8535 Ops/s 266.7245 Ops/s $\color{#35bf28}+4.17\%$
test_sac_speed[reduce-overhead-None] 2.3790ms 1.8397ms 543.5683 Ops/s 535.1546 Ops/s $\color{#35bf28}+1.57\%$
test_sac_speed[reduce-overhead-backward] 4.4903ms 3.7623ms 265.7936 Ops/s 285.2286 Ops/s $\textbf{\color{#d91a1a}-6.81\%}$
test_redq_speed[False-None] 17.7290ms 13.3254ms 75.0445 Ops/s 71.7289 Ops/s $\color{#35bf28}+4.62\%$
test_redq_speed[False-backward] 24.5728ms 22.7358ms 43.9835 Ops/s 44.8756 Ops/s $\color{#d91a1a}-1.99\%$
test_redq_speed[True-None] 6.8629ms 5.3865ms 185.6479 Ops/s 193.6125 Ops/s $\color{#d91a1a}-4.11\%$
test_redq_speed[True-backward] 13.0343ms 12.6655ms 78.9545 Ops/s 81.0863 Ops/s $\color{#d91a1a}-2.63\%$
test_redq_speed[reduce-overhead-None] 6.2286ms 5.5110ms 181.4556 Ops/s 195.2418 Ops/s $\textbf{\color{#d91a1a}-7.06\%}$
test_redq_speed[reduce-overhead-backward] 16.1579ms 12.8955ms 77.5464 Ops/s 75.4183 Ops/s $\color{#35bf28}+2.82\%$
test_redq_deprec_speed[False-None] 17.7464ms 13.5014ms 74.0665 Ops/s 69.6154 Ops/s $\textbf{\color{#35bf28}+6.39\%}$
test_redq_deprec_speed[False-backward] 20.9934ms 19.7978ms 50.5106 Ops/s 49.8146 Ops/s $\color{#35bf28}+1.40\%$
test_redq_deprec_speed[True-None] 4.1239ms 3.7421ms 267.2327 Ops/s 227.6630 Ops/s $\textbf{\color{#35bf28}+17.38\%}$
test_redq_deprec_speed[True-backward] 9.8355ms 8.8489ms 113.0078 Ops/s 107.4896 Ops/s $\textbf{\color{#35bf28}+5.13\%}$
test_redq_deprec_speed[reduce-overhead-None] 4.8669ms 3.8274ms 261.2772 Ops/s 224.6298 Ops/s $\textbf{\color{#35bf28}+16.31\%}$
test_redq_deprec_speed[reduce-overhead-backward] 11.5178ms 9.0033ms 111.0700 Ops/s 113.5907 Ops/s $\color{#d91a1a}-2.22\%$
test_td3_speed[False-None] 8.6220ms 7.8705ms 127.0571 Ops/s 114.6842 Ops/s $\textbf{\color{#35bf28}+10.79\%}$
test_td3_speed[False-backward] 10.9196ms 10.5204ms 95.0537 Ops/s 90.0451 Ops/s $\textbf{\color{#35bf28}+5.56\%}$
test_td3_speed[True-None] 1.9359ms 1.7462ms 572.6588 Ops/s 571.7063 Ops/s $\color{#35bf28}+0.17\%$
test_td3_speed[True-backward] 3.8713ms 3.5610ms 280.8183 Ops/s 268.6494 Ops/s $\color{#35bf28}+4.53\%$
test_td3_speed[reduce-overhead-None] 1.8872ms 1.7418ms 574.1124 Ops/s 543.5254 Ops/s $\textbf{\color{#35bf28}+5.63\%}$
test_td3_speed[reduce-overhead-backward] 3.4428ms 3.3332ms 300.0107 Ops/s 293.7225 Ops/s $\color{#35bf28}+2.14\%$
test_cql_speed[False-None] 38.0596ms 36.0910ms 27.7077 Ops/s 27.6337 Ops/s $\color{#35bf28}+0.27\%$
test_cql_speed[False-backward] 51.5767ms 47.3085ms 21.1378 Ops/s 21.2691 Ops/s $\color{#d91a1a}-0.62\%$
test_cql_speed[True-None] 17.3499ms 15.8940ms 62.9168 Ops/s 60.9152 Ops/s $\color{#35bf28}+3.29\%$
test_cql_speed[True-backward] 23.7400ms 22.4484ms 44.5465 Ops/s 42.8661 Ops/s $\color{#35bf28}+3.92\%$
test_cql_speed[reduce-overhead-None] 17.2069ms 15.7756ms 63.3889 Ops/s 61.7794 Ops/s $\color{#35bf28}+2.61\%$
test_cql_speed[reduce-overhead-backward] 24.0738ms 22.8234ms 43.8146 Ops/s 44.2223 Ops/s $\color{#d91a1a}-0.92\%$
test_a2c_speed[False-None] 8.5813ms 7.3300ms 136.4251 Ops/s 136.3158 Ops/s $\color{#35bf28}+0.08\%$
test_a2c_speed[False-backward] 16.9571ms 14.8203ms 67.4751 Ops/s 68.6594 Ops/s $\color{#d91a1a}-1.72\%$
test_a2c_speed[True-None] 3.6562ms 3.3148ms 301.6789 Ops/s 294.2129 Ops/s $\color{#35bf28}+2.54\%$
test_a2c_speed[True-backward] 10.1901ms 9.7111ms 102.9744 Ops/s 98.7528 Ops/s $\color{#35bf28}+4.28\%$
test_a2c_speed[reduce-overhead-None] 3.8281ms 3.3246ms 300.7919 Ops/s 273.5356 Ops/s $\textbf{\color{#35bf28}+9.96\%}$
test_a2c_speed[reduce-overhead-backward] 10.2226ms 9.8551ms 101.4703 Ops/s 96.1015 Ops/s $\textbf{\color{#35bf28}+5.59\%}$
test_ppo_speed[False-None] 8.8903ms 7.6390ms 130.9073 Ops/s 127.3540 Ops/s $\color{#35bf28}+2.79\%$
test_ppo_speed[False-backward] 15.9171ms 15.2320ms 65.6511 Ops/s 62.9865 Ops/s $\color{#35bf28}+4.23\%$
test_ppo_speed[True-None] 4.3180ms 3.7766ms 264.7850 Ops/s 238.1387 Ops/s $\textbf{\color{#35bf28}+11.19\%}$
test_ppo_speed[True-backward] 10.3144ms 9.9908ms 100.0925 Ops/s 98.8284 Ops/s $\color{#35bf28}+1.28\%$
test_ppo_speed[reduce-overhead-None] 4.6861ms 3.7438ms 267.1118 Ops/s 256.9025 Ops/s $\color{#35bf28}+3.97\%$
test_ppo_speed[reduce-overhead-backward] 10.4818ms 9.5497ms 104.7149 Ops/s 101.1324 Ops/s $\color{#35bf28}+3.54\%$
test_reinforce_speed[False-None] 8.7850ms 6.4866ms 154.1636 Ops/s 151.8576 Ops/s $\color{#35bf28}+1.52\%$
test_reinforce_speed[False-backward] 10.2290ms 9.8074ms 101.9643 Ops/s 98.4882 Ops/s $\color{#35bf28}+3.53\%$
test_reinforce_speed[True-None] 3.2026ms 2.6585ms 376.1532 Ops/s 359.8668 Ops/s $\color{#35bf28}+4.53\%$
test_reinforce_speed[True-backward] 9.4092ms 8.8203ms 113.3745 Ops/s 111.7532 Ops/s $\color{#35bf28}+1.45\%$
test_reinforce_speed[reduce-overhead-None] 2.9957ms 2.7474ms 363.9764 Ops/s 365.2544 Ops/s $\color{#d91a1a}-0.35\%$
test_reinforce_speed[reduce-overhead-backward] 10.1820ms 9.0866ms 110.0521 Ops/s 111.1523 Ops/s $\color{#d91a1a}-0.99\%$
test_iql_speed[False-None] 34.3640ms 32.5250ms 30.7456 Ops/s 30.7446 Ops/s $+0.00\%$
test_iql_speed[False-backward] 47.7762ms 46.3346ms 21.5822 Ops/s 21.9027 Ops/s $\color{#d91a1a}-1.46\%$
test_iql_speed[True-None] 11.3972ms 10.5743ms 94.5689 Ops/s 88.5688 Ops/s $\textbf{\color{#35bf28}+6.77\%}$
test_iql_speed[True-backward] 23.4254ms 22.1956ms 45.0540 Ops/s 44.2660 Ops/s $\color{#35bf28}+1.78\%$
test_iql_speed[reduce-overhead-None] 12.6851ms 11.0929ms 90.1477 Ops/s 87.7297 Ops/s $\color{#35bf28}+2.76\%$
test_iql_speed[reduce-overhead-backward] 23.9545ms 22.7317ms 43.9913 Ops/s 42.5748 Ops/s $\color{#35bf28}+3.33\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.4565ms 5.2149ms 191.7575 Ops/s 184.1268 Ops/s $\color{#35bf28}+4.14\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9425ms 0.5363ms 1.8647 KOps/s 1.7016 KOps/s $\textbf{\color{#35bf28}+9.58\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8394ms 0.4848ms 2.0626 KOps/s 1.9282 KOps/s $\textbf{\color{#35bf28}+6.97\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.9860ms 4.5861ms 218.0498 Ops/s 186.4949 Ops/s $\textbf{\color{#35bf28}+16.92\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.7542ms 0.4939ms 2.0249 KOps/s 1.8929 KOps/s $\textbf{\color{#35bf28}+6.97\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8242ms 0.4690ms 2.1322 KOps/s 1.9698 KOps/s $\textbf{\color{#35bf28}+8.24\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.4967ms 1.6289ms 613.9273 Ops/s 592.9289 Ops/s $\color{#35bf28}+3.54\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.4666ms 1.5748ms 635.0054 Ops/s 613.3251 Ops/s $\color{#35bf28}+3.53\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.7496ms 4.8224ms 207.3656 Ops/s 186.0067 Ops/s $\textbf{\color{#35bf28}+11.48\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0624ms 0.6402ms 1.5620 KOps/s 1.4691 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8028ms 0.6107ms 1.6374 KOps/s 1.5204 KOps/s $\textbf{\color{#35bf28}+7.70\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.0634ms 4.6385ms 215.5863 Ops/s 186.1260 Ops/s $\textbf{\color{#35bf28}+15.83\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0649ms 0.5096ms 1.9622 KOps/s 1.8520 KOps/s $\textbf{\color{#35bf28}+5.95\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6939ms 0.4923ms 2.0314 KOps/s 1.9329 KOps/s $\textbf{\color{#35bf28}+5.09\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.9653ms 4.5507ms 219.7466 Ops/s 193.9434 Ops/s $\textbf{\color{#35bf28}+13.30\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.9279ms 0.4984ms 2.0063 KOps/s 1.8955 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7063ms 0.4687ms 2.1337 KOps/s 2.0065 KOps/s $\textbf{\color{#35bf28}+6.34\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.7780ms 4.7129ms 212.1831 Ops/s 185.1126 Ops/s $\textbf{\color{#35bf28}+14.62\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.6328ms 0.6431ms 1.5550 KOps/s 1.4403 KOps/s $\textbf{\color{#35bf28}+7.97\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8066ms 0.6127ms 1.6320 KOps/s 1.5365 KOps/s $\textbf{\color{#35bf28}+6.22\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.3724s 11.5419ms 86.6410 Ops/s 233.2164 Ops/s $\textbf{\color{#d91a1a}-62.85\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.9404ms 2.3722ms 421.5549 Ops/s 425.8497 Ops/s $\color{#d91a1a}-1.01\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 4.9285ms 1.3438ms 744.1834 Ops/s 810.1777 Ops/s $\textbf{\color{#d91a1a}-8.15\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 5.7524ms 4.2673ms 234.3389 Ops/s 35.0657 Ops/s $\textbf{\color{#35bf28}+568.28\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.2193ms 2.4962ms 400.6024 Ops/s 472.9620 Ops/s $\textbf{\color{#d91a1a}-15.30\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.5067ms 1.2486ms 800.8818 Ops/s 808.3228 Ops/s $\color{#d91a1a}-0.92\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.3578s 11.4554ms 87.2947 Ops/s 195.5181 Ops/s $\textbf{\color{#d91a1a}-55.35\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 7.3130ms 2.4516ms 407.9039 Ops/s 382.9164 Ops/s $\textbf{\color{#35bf28}+6.53\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 6.4971ms 1.4472ms 691.0085 Ops/s 628.5445 Ops/s $\textbf{\color{#35bf28}+9.94\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 10.9853ms 10.8494ms 92.1708 Ops/s 81.9626 Ops/s $\textbf{\color{#35bf28}+12.45\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 15.9647ms 14.5507ms 68.7252 Ops/s 66.8272 Ops/s $\color{#35bf28}+2.84\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 21.1387ms 19.6964ms 50.7706 Ops/s 47.5941 Ops/s $\textbf{\color{#35bf28}+6.67\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 14.8829ms 14.5910ms 68.5355 Ops/s 65.3464 Ops/s $\color{#35bf28}+4.88\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 20.1779ms 19.7104ms 50.7347 Ops/s 47.7244 Ops/s $\textbf{\color{#35bf28}+6.31\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 17.6655ms 15.6878ms 63.7437 Ops/s 61.9624 Ops/s $\color{#35bf28}+2.87\%$

Copy link

github-actions bot commented Nov 13, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.7440s 0.7431s 1.3457 Ops/s 1.3152 Ops/s $\color{#35bf28}+2.32\%$
test_transformed 1.0895s 1.0142s 0.9860 Ops/s 1.0158 Ops/s $\color{#d91a1a}-2.93\%$
test_serial 2.2444s 2.1674s 0.4614 Ops/s 0.4701 Ops/s $\color{#d91a1a}-1.85\%$
test_parallel 2.0285s 1.9649s 0.5089 Ops/s 0.5136 Ops/s $\color{#d91a1a}-0.91\%$
test_step_mdp_speed[True-True-True-True-True] 0.4358ms 35.0252μs 28.5508 KOps/s 27.7198 KOps/s $\color{#35bf28}+3.00\%$
test_step_mdp_speed[True-True-True-True-False] 47.7210μs 20.5519μs 48.6574 KOps/s 47.8974 KOps/s $\color{#35bf28}+1.59\%$
test_step_mdp_speed[True-True-True-False-True] 0.3911ms 20.4524μs 48.8939 KOps/s 49.4430 KOps/s $\color{#d91a1a}-1.11\%$
test_step_mdp_speed[True-True-True-False-False] 37.0210μs 11.5248μs 86.7692 KOps/s 84.2127 KOps/s $\color{#35bf28}+3.04\%$
test_step_mdp_speed[True-True-False-True-True] 0.4239ms 38.4857μs 25.9837 KOps/s 25.6276 KOps/s $\color{#35bf28}+1.39\%$
test_step_mdp_speed[True-True-False-True-False] 0.3993ms 22.2517μs 44.9405 KOps/s 44.3438 KOps/s $\color{#35bf28}+1.35\%$
test_step_mdp_speed[True-True-False-False-True] 0.4010ms 22.3328μs 44.7771 KOps/s 45.2518 KOps/s $\color{#d91a1a}-1.05\%$
test_step_mdp_speed[True-True-False-False-False] 42.9100μs 13.6980μs 73.0033 KOps/s 72.5467 KOps/s $\color{#35bf28}+0.63\%$
test_step_mdp_speed[True-False-True-True-True] 0.4200ms 40.2499μs 24.8448 KOps/s 24.5235 KOps/s $\color{#35bf28}+1.31\%$
test_step_mdp_speed[True-False-True-True-False] 0.4013ms 24.3870μs 41.0054 KOps/s 39.9667 KOps/s $\color{#35bf28}+2.60\%$
test_step_mdp_speed[True-False-True-False-True] 61.5810μs 22.2802μs 44.8829 KOps/s 44.5527 KOps/s $\color{#35bf28}+0.74\%$
test_step_mdp_speed[True-False-True-False-False] 0.3919ms 13.6770μs 73.1154 KOps/s 71.8882 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[True-False-False-True-True] 0.4223ms 41.5548μs 24.0646 KOps/s 23.5714 KOps/s $\color{#35bf28}+2.09\%$
test_step_mdp_speed[True-False-False-True-False] 60.1710μs 27.0132μs 37.0189 KOps/s 37.4861 KOps/s $\color{#d91a1a}-1.25\%$
test_step_mdp_speed[True-False-False-False-True] 0.4066ms 24.1823μs 41.3525 KOps/s 41.9693 KOps/s $\color{#d91a1a}-1.47\%$
test_step_mdp_speed[True-False-False-False-False] 0.3925ms 15.5366μs 64.3640 KOps/s 63.6256 KOps/s $\color{#35bf28}+1.16\%$
test_step_mdp_speed[False-True-True-True-True] 79.9310μs 40.7540μs 24.5375 KOps/s 24.7619 KOps/s $\color{#d91a1a}-0.91\%$
test_step_mdp_speed[False-True-True-True-False] 0.4048ms 24.6436μs 40.5785 KOps/s 40.4585 KOps/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[False-True-True-False-True] 0.4096ms 26.2626μs 38.0769 KOps/s 38.4298 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[False-True-True-False-False] 0.4035ms 15.1560μs 65.9803 KOps/s 64.3845 KOps/s $\color{#35bf28}+2.48\%$
test_step_mdp_speed[False-True-False-True-True] 74.4810μs 42.6351μs 23.4549 KOps/s 23.8364 KOps/s $\color{#d91a1a}-1.60\%$
test_step_mdp_speed[False-True-False-True-False] 0.4046ms 26.9202μs 37.1469 KOps/s 38.1010 KOps/s $\color{#d91a1a}-2.50\%$
test_step_mdp_speed[False-True-False-False-True] 3.2475ms 27.9559μs 35.7707 KOps/s 36.9160 KOps/s $\color{#d91a1a}-3.10\%$
test_step_mdp_speed[False-True-False-False-False] 84.0010μs 17.0929μs 58.5038 KOps/s 56.7728 KOps/s $\color{#35bf28}+3.05\%$
test_step_mdp_speed[False-False-True-True-True] 0.4410ms 43.9178μs 22.7698 KOps/s 22.4325 KOps/s $\color{#35bf28}+1.50\%$
test_step_mdp_speed[False-False-True-True-False] 0.4362ms 28.9640μs 34.5256 KOps/s 34.8220 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[False-False-True-False-True] 90.0320μs 27.6987μs 36.1027 KOps/s 36.6736 KOps/s $\color{#d91a1a}-1.56\%$
test_step_mdp_speed[False-False-True-False-False] 46.8010μs 17.1724μs 58.2331 KOps/s 57.2455 KOps/s $\color{#35bf28}+1.73\%$
test_step_mdp_speed[False-False-False-True-True] 0.1215ms 45.6480μs 21.9068 KOps/s 22.2629 KOps/s $\color{#d91a1a}-1.60\%$
test_step_mdp_speed[False-False-False-True-False] 57.7710μs 30.3598μs 32.9383 KOps/s 32.8248 KOps/s $\color{#35bf28}+0.35\%$
test_step_mdp_speed[False-False-False-False-True] 75.7720μs 29.3903μs 34.0249 KOps/s 34.6368 KOps/s $\color{#d91a1a}-1.77\%$
test_step_mdp_speed[False-False-False-False-False] 59.2210μs 18.8685μs 52.9983 KOps/s 52.3356 KOps/s $\color{#35bf28}+1.27\%$
test_values[generalized_advantage_estimate-True-True] 25.0735ms 24.5408ms 40.7485 Ops/s 40.9602 Ops/s $\color{#d91a1a}-0.52\%$
test_values[vec_generalized_advantage_estimate-True-True] 97.5287ms 2.8344ms 352.8111 Ops/s 348.9580 Ops/s $\color{#35bf28}+1.10\%$
test_values[td0_return_estimate-False-False] 85.6810μs 65.3125μs 15.3110 KOps/s 15.5358 KOps/s $\color{#d91a1a}-1.45\%$
test_values[td1_return_estimate-False-False] 54.5432ms 54.3062ms 18.4141 Ops/s 18.3523 Ops/s $\color{#35bf28}+0.34\%$
test_values[vec_td1_return_estimate-False-False] 1.3735ms 1.0705ms 934.1197 Ops/s 933.0392 Ops/s $\color{#35bf28}+0.12\%$
test_values[td_lambda_return_estimate-True-False] 86.9730ms 86.6729ms 11.5376 Ops/s 11.5032 Ops/s $\color{#35bf28}+0.30\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3968ms 1.0695ms 935.0321 Ops/s 929.8276 Ops/s $\color{#35bf28}+0.56\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.6027ms 24.3828ms 41.0126 Ops/s 40.9871 Ops/s $\color{#35bf28}+0.06\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0250ms 0.7521ms 1.3297 KOps/s 1.3589 KOps/s $\color{#d91a1a}-2.15\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7595ms 0.6562ms 1.5240 KOps/s 1.5278 KOps/s $\color{#d91a1a}-0.25\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5304ms 1.4671ms 681.6246 Ops/s 683.7780 Ops/s $\color{#d91a1a}-0.31\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7115ms 0.6693ms 1.4942 KOps/s 1.4922 KOps/s $\color{#35bf28}+0.13\%$
test_dqn_speed[False-None] 6.7896ms 1.3304ms 751.6679 Ops/s 764.3913 Ops/s $\color{#d91a1a}-1.66\%$
test_dqn_speed[False-backward] 1.9383ms 1.8301ms 546.4116 Ops/s 552.1512 Ops/s $\color{#d91a1a}-1.04\%$
test_dqn_speed[True-None] 0.9552ms 0.5665ms 1.7653 KOps/s 1.7214 KOps/s $\color{#35bf28}+2.55\%$
test_dqn_speed[True-backward] 1.0648ms 1.0249ms 975.7232 Ops/s 968.8170 Ops/s $\color{#35bf28}+0.71\%$
test_dqn_speed[reduce-overhead-None] 0.7032ms 0.5757ms 1.7369 KOps/s 1.7703 KOps/s $\color{#d91a1a}-1.89\%$
test_dqn_speed[reduce-overhead-backward] 1.0603ms 1.0272ms 973.5095 Ops/s 987.7478 Ops/s $\color{#d91a1a}-1.44\%$
test_ddpg_speed[False-None] 3.9412ms 2.7651ms 361.6448 Ops/s 374.3386 Ops/s $\color{#d91a1a}-3.39\%$
test_ddpg_speed[False-backward] 4.0347ms 3.9371ms 253.9941 Ops/s 255.8224 Ops/s $\color{#d91a1a}-0.71\%$
test_ddpg_speed[True-None] 1.6208ms 1.2550ms 796.8408 Ops/s 807.8024 Ops/s $\color{#d91a1a}-1.36\%$
test_ddpg_speed[True-backward] 2.6504ms 2.2426ms 445.9195 Ops/s 368.8215 Ops/s $\textbf{\color{#35bf28}+20.90\%}$
test_ddpg_speed[reduce-overhead-None] 1.6734ms 1.2577ms 795.1086 Ops/s 807.0327 Ops/s $\color{#d91a1a}-1.48\%$
test_ddpg_speed[reduce-overhead-backward] 2.3104ms 2.2333ms 447.7656 Ops/s 455.1805 Ops/s $\color{#d91a1a}-1.63\%$
test_sac_speed[False-None] 8.6343ms 7.5805ms 131.9172 Ops/s 133.1806 Ops/s $\color{#d91a1a}-0.95\%$
test_sac_speed[False-backward] 11.2001ms 10.8100ms 92.5068 Ops/s 92.8397 Ops/s $\color{#d91a1a}-0.36\%$
test_sac_speed[True-None] 2.3905ms 2.0047ms 498.8360 Ops/s 489.1853 Ops/s $\color{#35bf28}+1.97\%$
test_sac_speed[True-backward] 4.0111ms 3.9202ms 255.0921 Ops/s 254.7785 Ops/s $\color{#35bf28}+0.12\%$
test_sac_speed[reduce-overhead-None] 2.4016ms 2.0113ms 497.1955 Ops/s 499.0190 Ops/s $\color{#d91a1a}-0.37\%$
test_sac_speed[reduce-overhead-backward] 4.0230ms 3.9121ms 255.6186 Ops/s 257.1466 Ops/s $\color{#d91a1a}-0.59\%$
test_redq_speed[False-None] 11.4532ms 9.9666ms 100.3351 Ops/s 96.2444 Ops/s $\color{#35bf28}+4.25\%$
test_redq_speed[False-backward] 18.1248ms 17.1339ms 58.3639 Ops/s 58.0553 Ops/s $\color{#35bf28}+0.53\%$
test_redq_speed[True-None] 3.9689ms 3.4658ms 288.5305 Ops/s 277.2680 Ops/s $\color{#35bf28}+4.06\%$
test_redq_speed[True-backward] 8.7344ms 8.4311ms 118.6080 Ops/s 110.0668 Ops/s $\textbf{\color{#35bf28}+7.76\%}$
test_redq_speed[reduce-overhead-None] 3.6778ms 3.4673ms 288.4062 Ops/s 288.1282 Ops/s $\color{#35bf28}+0.10\%$
test_redq_speed[reduce-overhead-backward] 8.8435ms 8.5386ms 117.1152 Ops/s 117.9213 Ops/s $\color{#d91a1a}-0.68\%$
test_redq_deprec_speed[False-None] 12.1963ms 10.5728ms 94.5821 Ops/s 96.5282 Ops/s $\color{#d91a1a}-2.02\%$
test_redq_deprec_speed[False-backward] 15.7393ms 15.3703ms 65.0606 Ops/s 66.5348 Ops/s $\color{#d91a1a}-2.22\%$
test_redq_deprec_speed[True-None] 3.6762ms 3.2179ms 310.7613 Ops/s 307.6519 Ops/s $\color{#35bf28}+1.01\%$
test_redq_deprec_speed[True-backward] 7.2538ms 7.0164ms 142.5230 Ops/s 139.9745 Ops/s $\color{#35bf28}+1.82\%$
test_redq_deprec_speed[reduce-overhead-None] 3.5630ms 3.2147ms 311.0752 Ops/s 308.9836 Ops/s $\color{#35bf28}+0.68\%$
test_redq_deprec_speed[reduce-overhead-backward] 7.3562ms 7.0358ms 142.1311 Ops/s 139.1613 Ops/s $\color{#35bf28}+2.13\%$
test_td3_speed[False-None] 7.7200ms 7.5095ms 133.1647 Ops/s 133.9583 Ops/s $\color{#d91a1a}-0.59\%$
test_td3_speed[False-backward] 10.9203ms 10.3689ms 96.4425 Ops/s 96.5011 Ops/s $\color{#d91a1a}-0.06\%$
test_td3_speed[True-None] 1.9418ms 1.9012ms 525.9884 Ops/s 535.8512 Ops/s $\color{#d91a1a}-1.84\%$
test_td3_speed[True-backward] 3.7722ms 3.6593ms 273.2797 Ops/s 226.1867 Ops/s $\textbf{\color{#35bf28}+20.82\%}$
test_td3_speed[reduce-overhead-None] 1.9208ms 1.8891ms 529.3607 Ops/s 532.6344 Ops/s $\color{#d91a1a}-0.61\%$
test_td3_speed[reduce-overhead-backward] 3.8328ms 3.6885ms 271.1097 Ops/s 273.9186 Ops/s $\color{#d91a1a}-1.03\%$
test_cql_speed[False-None] 28.1416ms 24.8559ms 40.2319 Ops/s 40.2853 Ops/s $\color{#d91a1a}-0.13\%$
test_cql_speed[False-backward] 37.8566ms 34.5008ms 28.9848 Ops/s 29.5510 Ops/s $\color{#d91a1a}-1.92\%$
test_cql_speed[True-None] 11.1259ms 10.7761ms 92.7975 Ops/s 93.8750 Ops/s $\color{#d91a1a}-1.15\%$
test_cql_speed[True-backward] 17.1492ms 16.5275ms 60.5052 Ops/s 61.4328 Ops/s $\color{#d91a1a}-1.51\%$
test_cql_speed[reduce-overhead-None] 11.1150ms 10.7830ms 92.7382 Ops/s 94.3861 Ops/s $\color{#d91a1a}-1.75\%$
test_cql_speed[reduce-overhead-backward] 17.1606ms 16.4867ms 60.6551 Ops/s 61.3142 Ops/s $\color{#d91a1a}-1.07\%$
test_a2c_speed[False-None] 5.5826ms 5.3562ms 186.6979 Ops/s 185.1397 Ops/s $\color{#35bf28}+0.84\%$
test_a2c_speed[False-backward] 12.0949ms 11.7154ms 85.3574 Ops/s 84.4911 Ops/s $\color{#35bf28}+1.03\%$
test_a2c_speed[True-None] 3.2132ms 3.0566ms 327.1619 Ops/s 316.1874 Ops/s $\color{#35bf28}+3.47\%$
test_a2c_speed[True-backward] 8.7122ms 8.3743ms 119.4128 Ops/s 117.9191 Ops/s $\color{#35bf28}+1.27\%$
test_a2c_speed[reduce-overhead-None] 3.3584ms 3.0012ms 333.1956 Ops/s 329.3186 Ops/s $\color{#35bf28}+1.18\%$
test_a2c_speed[reduce-overhead-backward] 8.6326ms 8.3112ms 120.3199 Ops/s 121.2322 Ops/s $\color{#d91a1a}-0.75\%$
test_ppo_speed[False-None] 5.9123ms 5.7312ms 174.4834 Ops/s 176.3969 Ops/s $\color{#d91a1a}-1.08\%$
test_ppo_speed[False-backward] 12.9659ms 12.2354ms 81.7302 Ops/s 82.7125 Ops/s $\color{#d91a1a}-1.19\%$
test_ppo_speed[True-None] 3.5560ms 3.3905ms 294.9417 Ops/s 283.0841 Ops/s $\color{#35bf28}+4.19\%$
test_ppo_speed[True-backward] 8.5095ms 8.1659ms 122.4607 Ops/s 123.5219 Ops/s $\color{#d91a1a}-0.86\%$
test_ppo_speed[reduce-overhead-None] 3.6420ms 3.4181ms 292.5570 Ops/s 294.5573 Ops/s $\color{#d91a1a}-0.68\%$
test_ppo_speed[reduce-overhead-backward] 8.5857ms 8.0604ms 124.0626 Ops/s 125.7101 Ops/s $\color{#d91a1a}-1.31\%$
test_reinforce_speed[False-None] 4.7867ms 4.4434ms 225.0509 Ops/s 227.7101 Ops/s $\color{#d91a1a}-1.17\%$
test_reinforce_speed[False-backward] 8.8642ms 7.2411ms 138.1007 Ops/s 140.5352 Ops/s $\color{#d91a1a}-1.73\%$
test_reinforce_speed[True-None] 2.6131ms 2.2311ms 448.1999 Ops/s 440.2447 Ops/s $\color{#35bf28}+1.81\%$
test_reinforce_speed[True-backward] 7.3318ms 6.9865ms 143.1331 Ops/s 143.2529 Ops/s $\color{#d91a1a}-0.08\%$
test_reinforce_speed[reduce-overhead-None] 2.3744ms 2.2191ms 450.6268 Ops/s 461.1190 Ops/s $\color{#d91a1a}-2.28\%$
test_reinforce_speed[reduce-overhead-backward] 7.3072ms 6.9682ms 143.5093 Ops/s 145.2867 Ops/s $\color{#d91a1a}-1.22\%$
test_iql_speed[False-None] 19.7007ms 19.0035ms 52.6219 Ops/s 51.4806 Ops/s $\color{#35bf28}+2.22\%$
test_iql_speed[False-backward] 30.1706ms 29.4555ms 33.9495 Ops/s 33.6393 Ops/s $\color{#35bf28}+0.92\%$
test_iql_speed[True-None] 7.1334ms 6.6386ms 150.6346 Ops/s 150.5172 Ops/s $\color{#35bf28}+0.08\%$
test_iql_speed[True-backward] 15.5010ms 15.0303ms 66.5324 Ops/s 63.9679 Ops/s $\color{#35bf28}+4.01\%$
test_iql_speed[reduce-overhead-None] 6.9503ms 6.6436ms 150.5212 Ops/s 150.0009 Ops/s $\color{#35bf28}+0.35\%$
test_iql_speed[reduce-overhead-backward] 15.6615ms 15.1221ms 66.1283 Ops/s 65.9188 Ops/s $\color{#35bf28}+0.32\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8198ms 6.3343ms 157.8713 Ops/s 156.5351 Ops/s $\color{#35bf28}+0.85\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9639ms 0.3116ms 3.2090 KOps/s 3.5634 KOps/s $\textbf{\color{#d91a1a}-9.95\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6762ms 0.2881ms 3.4710 KOps/s 3.8312 KOps/s $\textbf{\color{#d91a1a}-9.40\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.5579ms 6.0514ms 165.2520 Ops/s 162.9101 Ops/s $\color{#35bf28}+1.44\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0509ms 0.2760ms 3.6234 KOps/s 3.9093 KOps/s $\textbf{\color{#d91a1a}-7.31\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5031ms 0.2982ms 3.3540 KOps/s 3.5075 KOps/s $\color{#d91a1a}-4.38\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4724ms 1.2628ms 791.9130 Ops/s 717.7060 Ops/s $\textbf{\color{#35bf28}+10.34\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6617ms 1.2180ms 821.0070 Ops/s 741.1594 Ops/s $\textbf{\color{#35bf28}+10.77\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4479ms 6.1809ms 161.7896 Ops/s 158.1235 Ops/s $\color{#35bf28}+2.32\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7546ms 0.4091ms 2.4446 KOps/s 1.9950 KOps/s $\textbf{\color{#35bf28}+22.53\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5703ms 0.3884ms 2.5749 KOps/s 2.0895 KOps/s $\textbf{\color{#35bf28}+23.23\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3136ms 6.1494ms 162.6165 Ops/s 163.3513 Ops/s $\color{#d91a1a}-0.45\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8096ms 0.2704ms 3.6985 KOps/s 2.9668 KOps/s $\textbf{\color{#35bf28}+24.66\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4460ms 0.2487ms 4.0202 KOps/s 3.1563 KOps/s $\textbf{\color{#35bf28}+27.37\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.6304ms 6.1099ms 163.6698 Ops/s 164.1332 Ops/s $\color{#d91a1a}-0.28\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7577ms 0.2748ms 3.6389 KOps/s 3.8908 KOps/s $\textbf{\color{#d91a1a}-6.47\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4360ms 0.2357ms 4.2419 KOps/s 4.2472 KOps/s $\color{#d91a1a}-0.13\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.6759ms 6.2746ms 159.3729 Ops/s 160.3736 Ops/s $\color{#d91a1a}-0.62\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1599ms 0.4377ms 2.2849 KOps/s 2.0818 KOps/s $\textbf{\color{#35bf28}+9.76\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7893ms 0.3877ms 2.5791 KOps/s 2.3125 KOps/s $\textbf{\color{#35bf28}+11.53\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.4377s 13.9166ms 71.8567 Ops/s 192.5059 Ops/s $\textbf{\color{#d91a1a}-62.67\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.0018ms 1.9585ms 510.5992 Ops/s 487.8648 Ops/s $\color{#35bf28}+4.66\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.1115ms 1.0365ms 964.7853 Ops/s 899.2565 Ops/s $\textbf{\color{#35bf28}+7.29\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.2550ms 5.3145ms 188.1646 Ops/s 192.7468 Ops/s $\color{#d91a1a}-2.38\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.5837ms 2.0348ms 491.4574 Ops/s 448.8820 Ops/s $\textbf{\color{#35bf28}+9.48\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.1886ms 1.2409ms 805.8767 Ops/s 885.3673 Ops/s $\textbf{\color{#d91a1a}-8.98\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.3734s 12.8554ms 77.7884 Ops/s 36.3515 Ops/s $\textbf{\color{#35bf28}+113.99\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 12.2980ms 2.1115ms 473.6007 Ops/s 478.4083 Ops/s $\color{#d91a1a}-1.00\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 6.7531ms 1.3381ms 747.3256 Ops/s 756.7013 Ops/s $\color{#d91a1a}-1.24\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.4691ms 13.0163ms 76.8266 Ops/s 76.3054 Ops/s $\color{#35bf28}+0.68\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 18.3115ms 16.9872ms 58.8678 Ops/s 60.9732 Ops/s $\color{#d91a1a}-3.45\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 18.2371ms 17.5982ms 56.8239 Ops/s 56.0373 Ops/s $\color{#35bf28}+1.40\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 17.7240ms 17.3311ms 57.6999 Ops/s 60.4194 Ops/s $\color{#d91a1a}-4.50\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 18.3504ms 17.4935ms 57.1640 Ops/s 56.4166 Ops/s $\color{#35bf28}+1.32\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 18.7001ms 18.0851ms 55.2943 Ops/s 56.8090 Ops/s $\color{#d91a1a}-2.67\%$

with TemporaryDirectory() as tmpdir:
path = str(Path(tmpdir) / "model.pt2")
with torch.no_grad():
so_path = torch._inductor.aoti_compile_and_package(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just give you the package path, not the so path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I mixed up two docs (stable and main)
glad you caught it!

package_path=path,
)

compiled_module = load_package(str(Path(tmpdir) / "model.pt2"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use torch._inductor.aoti_load_package which is a wrapper around load_package

Copy link

@angelayi angelayi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up!!

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 13, 2024
ghstack-source-id: 4698373a93f20da7d33284ad13ee04a2076c8acb
Pull Request resolved: #2557
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 13, 2024
ghstack-source-id: b93146e22d8376563e7ac302b5cff95f09ae50d4
Pull Request resolved: #2557
@vmoens
Copy link
Contributor Author

vmoens commented Nov 13, 2024

Thanks for writing this up!!

Thanks for the feature! Extra useful

!pip install tensordict
!pip install torchrl
!pip install "gymnasium[atari,accept-rom-license]"<1.0.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be good to mention the min PyTorch version

@vmoens vmoens merged commit c3904b7 into gh/vmoens/40/base Nov 13, 2024
63 of 71 checks passed
vmoens added a commit that referenced this pull request Nov 13, 2024
ghstack-source-id: b93146e22d8376563e7ac302b5cff95f09ae50d4
Pull Request resolved: #2557
@vmoens vmoens deleted the gh/vmoens/40/head branch November 13, 2024 18:47
vmoens added a commit that referenced this pull request Nov 14, 2024
ghstack-source-id: b93146e22d8376563e7ac302b5cff95f09ae50d4
Pull Request resolved: #2557

(cherry picked from commit c0187a9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants