Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] TensorDictParams #500

Merged
merged 4 commits into from
Jul 27, 2023
Merged

[Feature] TensorDictParams #500

merged 4 commits into from
Jul 27, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 26, 2023

Description

Solves #498
Aims at solving pytorch/rl#1407

cc @albertbou92 @btx0424

TODO:

  • Dedicated tests
  • documentation
  • benchmark profiling

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 26, 2023
@vmoens vmoens added the enhancement New feature or request label Jul 26, 2023
@github-actions
Copy link

github-actions bot commented Jul 26, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 109. Improved: $\large\color{#35bf28}70$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4042ms 25.2146μs 39.6596 KOps/s 38.4524 KOps/s $\color{#35bf28}+3.14\%$
test_plain_set_stack_nested 1.4543ms 0.2342ms 4.2696 KOps/s 4.2097 KOps/s $\color{#35bf28}+1.42\%$
test_plain_set_nested_inplace 1.4672ms 31.0574μs 32.1985 KOps/s 31.0371 KOps/s $\color{#35bf28}+3.74\%$
test_plain_set_stack_nested_inplace 2.1384ms 0.3031ms 3.2994 KOps/s 3.3886 KOps/s $\color{#d91a1a}-2.63\%$
test_items 1.8682ms 4.2165μs 237.1657 KOps/s 231.6645 KOps/s $\color{#35bf28}+2.37\%$
test_items_nested 3.0294ms 0.4319ms 2.3153 KOps/s 2.1018 KOps/s $\textbf{\color{#35bf28}+10.16\%}$
test_items_nested_locked 2.5142ms 0.4327ms 2.3110 KOps/s 2.2253 KOps/s $\color{#35bf28}+3.85\%$
test_items_nested_leaf 2.1772ms 0.2600ms 3.8463 KOps/s 3.6691 KOps/s $\color{#35bf28}+4.83\%$
test_items_stack_nested 5.0779ms 2.7606ms 362.2385 Ops/s 343.3165 Ops/s $\textbf{\color{#35bf28}+5.51\%}$
test_items_stack_nested_leaf 5.7333ms 2.5773ms 388.0099 Ops/s 368.3166 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_items_stack_nested_locked 5.8920ms 1.4724ms 679.1447 Ops/s 650.0943 Ops/s $\color{#35bf28}+4.47\%$
test_keys 3.3655ms 6.4224μs 155.7059 KOps/s 170.8652 KOps/s $\textbf{\color{#d91a1a}-8.87\%}$
test_keys_nested 5.3506ms 0.2149ms 4.6531 KOps/s 4.6825 KOps/s $\color{#d91a1a}-0.63\%$
test_keys_nested_locked 4.9486ms 0.2151ms 4.6486 KOps/s 4.7116 KOps/s $\color{#d91a1a}-1.34\%$
test_keys_nested_leaf 3.8125ms 0.2043ms 4.8939 KOps/s 4.1640 KOps/s $\textbf{\color{#35bf28}+17.53\%}$
test_keys_stack_nested 8.3175ms 2.3906ms 418.3129 Ops/s 395.4788 Ops/s $\textbf{\color{#35bf28}+5.77\%}$
test_keys_stack_nested_leaf 7.6402ms 2.4504ms 408.0997 Ops/s 378.1775 Ops/s $\textbf{\color{#35bf28}+7.91\%}$
test_keys_stack_nested_locked 6.6845ms 1.0782ms 927.5133 Ops/s 844.2520 Ops/s $\textbf{\color{#35bf28}+9.86\%}$
test_values 7.8631ms 1.6024μs 624.0475 KOps/s 618.4232 KOps/s $\color{#35bf28}+0.91\%$
test_values_nested 2.8999ms 73.9602μs 13.5208 KOps/s 12.6680 KOps/s $\textbf{\color{#35bf28}+6.73\%}$
test_values_nested_locked 2.9099ms 77.6730μs 12.8745 KOps/s 12.1229 KOps/s $\textbf{\color{#35bf28}+6.20\%}$
test_values_nested_leaf 4.7850ms 66.1423μs 15.1189 KOps/s 14.2472 KOps/s $\textbf{\color{#35bf28}+6.12\%}$
test_values_stack_nested 5.7882ms 2.1467ms 465.8210 Ops/s 425.0921 Ops/s $\textbf{\color{#35bf28}+9.58\%}$
test_values_stack_nested_leaf 7.4750ms 2.1180ms 472.1477 Ops/s 436.4628 Ops/s $\textbf{\color{#35bf28}+8.18\%}$
test_values_stack_nested_locked 10.3833ms 0.9359ms 1.0685 KOps/s 979.6817 Ops/s $\textbf{\color{#35bf28}+9.06\%}$
test_membership 0.5384ms 2.1902μs 456.5724 KOps/s 467.5183 KOps/s $\color{#d91a1a}-2.34\%$
test_membership_nested 1.2581ms 4.6930μs 213.0850 KOps/s 233.3687 KOps/s $\textbf{\color{#d91a1a}-8.69\%}$
test_membership_nested_leaf 4.2555ms 4.7145μs 212.1110 KOps/s 215.0225 KOps/s $\color{#d91a1a}-1.35\%$
test_membership_stacked_nested 2.0555ms 17.6800μs 56.5612 KOps/s 55.1455 KOps/s $\color{#35bf28}+2.57\%$
test_membership_stacked_nested_leaf 4.5666ms 17.7040μs 56.4845 KOps/s 52.3362 KOps/s $\textbf{\color{#35bf28}+7.93\%}$
test_membership_nested_last 1.7258ms 8.8103μs 113.5030 KOps/s 100.8874 KOps/s $\textbf{\color{#35bf28}+12.50\%}$
test_membership_nested_leaf_last 1.5605ms 8.6862μs 115.1246 KOps/s 101.5547 KOps/s $\textbf{\color{#35bf28}+13.36\%}$
test_membership_stacked_nested_last 2.0758ms 0.2752ms 3.6337 KOps/s 3.2043 KOps/s $\textbf{\color{#35bf28}+13.40\%}$
test_membership_stacked_nested_leaf_last 1.2918ms 21.2852μs 46.9810 KOps/s 43.7542 KOps/s $\textbf{\color{#35bf28}+7.37\%}$
test_nested_getleaf 1.5646ms 17.2423μs 57.9970 KOps/s 51.8650 KOps/s $\textbf{\color{#35bf28}+11.82\%}$
test_nested_get 5.2175ms 16.8683μs 59.2828 KOps/s 52.8289 KOps/s $\textbf{\color{#35bf28}+12.22\%}$
test_stacked_getleaf 6.2955ms 1.1448ms 873.5296 Ops/s 819.4998 Ops/s $\textbf{\color{#35bf28}+6.59\%}$
test_stacked_get 3.0675ms 1.0579ms 945.2829 Ops/s 877.1987 Ops/s $\textbf{\color{#35bf28}+7.76\%}$
test_nested_getitemleaf 0.7447ms 17.5803μs 56.8818 KOps/s 52.7643 KOps/s $\textbf{\color{#35bf28}+7.80\%}$
test_nested_getitem 1.5888ms 16.5559μs 60.4014 KOps/s 53.7476 KOps/s $\textbf{\color{#35bf28}+12.38\%}$
test_stacked_getitemleaf 3.5806ms 1.1254ms 888.5520 Ops/s 804.7518 Ops/s $\textbf{\color{#35bf28}+10.41\%}$
test_stacked_getitem 3.3438ms 1.0265ms 974.1803 Ops/s 881.2407 Ops/s $\textbf{\color{#35bf28}+10.55\%}$
test_lock_nested 81.8078ms 1.9035ms 525.3615 Ops/s 511.4123 Ops/s $\color{#35bf28}+2.73\%$
test_lock_stack_nested 0.1238s 25.9876ms 38.4798 Ops/s 39.5893 Ops/s $\color{#d91a1a}-2.80\%$
test_unlock_nested 85.0352ms 1.9576ms 510.8353 Ops/s 488.8803 Ops/s $\color{#35bf28}+4.49\%$
test_unlock_stack_nested 0.1188s 26.1903ms 38.1821 Ops/s 37.7148 Ops/s $\color{#35bf28}+1.24\%$
test_flatten_speed 3.8543ms 1.3532ms 739.0132 Ops/s 701.3856 Ops/s $\textbf{\color{#35bf28}+5.36\%}$
test_unflatten_speed 7.4499ms 2.3168ms 431.6287 Ops/s 413.7455 Ops/s $\color{#35bf28}+4.32\%$
test_common_ops 6.9917ms 1.6474ms 607.0271 Ops/s 548.4889 Ops/s $\textbf{\color{#35bf28}+10.67\%}$
test_creation 2.6132ms 7.5580μs 132.3102 KOps/s 128.6040 KOps/s $\color{#35bf28}+2.88\%$
test_creation_empty 1.5439ms 17.0518μs 58.6447 KOps/s 52.5009 KOps/s $\textbf{\color{#35bf28}+11.70\%}$
test_creation_nested_1 5.3692ms 31.7324μs 31.5136 KOps/s 26.2030 KOps/s $\textbf{\color{#35bf28}+20.27\%}$
test_creation_nested_2 1.9043ms 36.7231μs 27.2308 KOps/s 24.9540 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_clone 1.5026ms 33.5309μs 29.8233 KOps/s 25.1100 KOps/s $\textbf{\color{#35bf28}+18.77\%}$
test_getitem[int] 1.4602ms 36.9038μs 27.0975 KOps/s 24.1572 KOps/s $\textbf{\color{#35bf28}+12.17\%}$
test_getitem[slice_int] 5.5703ms 87.5304μs 11.4246 KOps/s 11.3499 KOps/s $\color{#35bf28}+0.66\%$
test_getitem[range] 1.1629ms 0.1399ms 7.1476 KOps/s 6.8640 KOps/s $\color{#35bf28}+4.13\%$
test_getitem[tuple] 1.8748ms 61.6751μs 16.2140 KOps/s 14.1803 KOps/s $\textbf{\color{#35bf28}+14.34\%}$
test_getitem[list] 1.7008ms 0.1334ms 7.4948 KOps/s 7.0326 KOps/s $\textbf{\color{#35bf28}+6.57\%}$
test_setitem_dim[int] 0.2542ms 56.1714μs 17.8027 KOps/s 16.2598 KOps/s $\textbf{\color{#35bf28}+9.49\%}$
test_setitem_dim[slice_int] 0.5813ms 0.1044ms 9.5765 KOps/s 7.9272 KOps/s $\textbf{\color{#35bf28}+20.81\%}$
test_setitem_dim[range] 1.4575ms 0.1360ms 7.3533 KOps/s 6.5913 KOps/s $\textbf{\color{#35bf28}+11.56\%}$
test_setitem_dim[tuple] 2.0988ms 84.3210μs 11.8594 KOps/s 10.6262 KOps/s $\textbf{\color{#35bf28}+11.61\%}$
test_setitem 2.1369ms 49.2970μs 20.2852 KOps/s 18.3112 KOps/s $\textbf{\color{#35bf28}+10.78\%}$
test_set 1.6072ms 46.5069μs 21.5022 KOps/s 18.2461 KOps/s $\textbf{\color{#35bf28}+17.85\%}$
test_set_shared 3.6493ms 0.2917ms 3.4280 KOps/s 3.1450 KOps/s $\textbf{\color{#35bf28}+9.00\%}$
test_update 2.3620ms 50.9261μs 19.6363 KOps/s 16.3734 KOps/s $\textbf{\color{#35bf28}+19.93\%}$
test_update_nested 4.9127ms 76.2378μs 13.1169 KOps/s 11.5420 KOps/s $\textbf{\color{#35bf28}+13.64\%}$
test_set_nested 1.7383ms 49.7862μs 20.0859 KOps/s 17.3574 KOps/s $\textbf{\color{#35bf28}+15.72\%}$
test_set_nested_new 1.6440ms 72.4665μs 13.7995 KOps/s 12.1933 KOps/s $\textbf{\color{#35bf28}+13.17\%}$
test_select 2.7477ms 0.1334ms 7.4938 KOps/s 6.5951 KOps/s $\textbf{\color{#35bf28}+13.63\%}$
test_unbind_speed 5.8875ms 0.7941ms 1.2594 KOps/s 1.1541 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_unbind_speed_stack0 0.1041s 10.8337ms 92.3044 Ops/s 89.4515 Ops/s $\color{#35bf28}+3.19\%$
test_unbind_speed_stack1 0.6228ms 1.0507μs 951.7270 KOps/s 785.6421 KOps/s $\textbf{\color{#35bf28}+21.14\%}$
test_creation[device0] 6.6139ms 0.6057ms 1.6510 KOps/s 1.6560 KOps/s $\color{#d91a1a}-0.30\%$
test_creation_from_tensor 5.9455ms 0.6748ms 1.4819 KOps/s 1.4209 KOps/s $\color{#35bf28}+4.29\%$
test_add_one[memmap_tensor0] 1.7830ms 66.3349μs 15.0750 KOps/s 13.7890 KOps/s $\textbf{\color{#35bf28}+9.33\%}$
test_contiguous[memmap_tensor0] 2.2015ms 12.6210μs 79.2329 KOps/s 66.8383 KOps/s $\textbf{\color{#35bf28}+18.54\%}$
test_stack[memmap_tensor0] 6.0918ms 47.6467μs 20.9878 KOps/s 20.4663 KOps/s $\color{#35bf28}+2.55\%$
test_memmaptd_index 2.1471ms 0.4054ms 2.4664 KOps/s 2.4093 KOps/s $\color{#35bf28}+2.37\%$
test_memmaptd_index_astensor 6.3908ms 2.1616ms 462.6148 Ops/s 394.3985 Ops/s $\textbf{\color{#35bf28}+17.30\%}$
test_memmaptd_index_op 11.1888ms 5.3123ms 188.2434 Ops/s 180.3481 Ops/s $\color{#35bf28}+4.38\%$
test_reshape_pytree 1.8238ms 48.1250μs 20.7792 KOps/s 19.8087 KOps/s $\color{#35bf28}+4.90\%$
test_reshape_td 4.4442ms 58.9312μs 16.9689 KOps/s 14.2248 KOps/s $\textbf{\color{#35bf28}+19.29\%}$
test_view_pytree 2.5486ms 41.5072μs 24.0922 KOps/s 20.2869 KOps/s $\textbf{\color{#35bf28}+18.76\%}$
test_view_td 1.1133ms 11.0254μs 90.6993 KOps/s 89.2991 KOps/s $\color{#35bf28}+1.57\%$
test_unbind_pytree 1.4852ms 46.6098μs 21.4547 KOps/s 20.1617 KOps/s $\textbf{\color{#35bf28}+6.41\%}$
test_unbind_td 6.6665ms 0.1193ms 8.3820 KOps/s 7.6421 KOps/s $\textbf{\color{#35bf28}+9.68\%}$
test_split_pytree 3.7200ms 56.3143μs 17.7575 KOps/s 16.7705 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_split_td 5.1803ms 0.1666ms 6.0030 KOps/s 5.5849 KOps/s $\textbf{\color{#35bf28}+7.49\%}$
test_add_pytree 2.5971ms 75.9564μs 13.1654 KOps/s 12.0901 KOps/s $\textbf{\color{#35bf28}+8.89\%}$
test_add_td 2.7554ms 0.1340ms 7.4637 KOps/s 6.4630 KOps/s $\textbf{\color{#35bf28}+15.48\%}$
test_distributed 1.2859ms 10.3342μs 96.7664 KOps/s 91.4933 KOps/s $\textbf{\color{#35bf28}+5.76\%}$
test_tdmodule 1.1138ms 47.9320μs 20.8629 KOps/s 21.4482 KOps/s $\color{#d91a1a}-2.73\%$
test_tdmodule_dispatch 0.4639ms 81.1451μs 12.3236 KOps/s 5.9537 KOps/s $\textbf{\color{#35bf28}+106.99\%}$
test_tdseq 1.4043ms 49.9272μs 20.0292 KOps/s 19.4769 KOps/s $\color{#35bf28}+2.84\%$
test_tdseq_dispatch 0.4426ms 95.2491μs 10.4988 KOps/s 9.5257 KOps/s $\textbf{\color{#35bf28}+10.22\%}$
test_instantiation_functorch 3.8820ms 2.0635ms 484.6025 Ops/s 427.7048 Ops/s $\textbf{\color{#35bf28}+13.30\%}$
test_instantiation_td 6.4404ms 1.7982ms 556.1165 Ops/s 529.0512 Ops/s $\textbf{\color{#35bf28}+5.12\%}$
test_exec_functorch 2.2255ms 0.2971ms 3.3654 KOps/s 3.0595 KOps/s $\textbf{\color{#35bf28}+10.00\%}$
test_exec_td 5.6103ms 0.3098ms 3.2281 KOps/s 2.9933 KOps/s $\textbf{\color{#35bf28}+7.84\%}$
test_vmap_mlp_speed[True-True] 8.8732ms 2.0452ms 488.9560 Ops/s 475.5722 Ops/s $\color{#35bf28}+2.81\%$
test_vmap_mlp_speed[True-False] 9.8261ms 1.0233ms 977.2753 Ops/s 929.4718 Ops/s $\textbf{\color{#35bf28}+5.14\%}$
test_vmap_mlp_speed[False-True] 9.0575ms 1.7449ms 573.1058 Ops/s 536.6586 Ops/s $\textbf{\color{#35bf28}+6.79\%}$
test_vmap_mlp_speed[False-False] 11.0064ms 0.7952ms 1.2575 KOps/s 1.2015 KOps/s $\color{#35bf28}+4.66\%$
test_vmap_transformer_speed[True-True] 27.4484ms 22.1277ms 45.1922 Ops/s 43.7133 Ops/s $\color{#35bf28}+3.38\%$
test_vmap_transformer_speed[True-False] 24.1284ms 14.4210ms 69.3434 Ops/s 68.4845 Ops/s $\color{#35bf28}+1.25\%$
test_vmap_transformer_speed[False-True] 29.0062ms 21.7568ms 45.9627 Ops/s 43.6021 Ops/s $\textbf{\color{#35bf28}+5.41\%}$
test_vmap_transformer_speed[False-False] 22.5803ms 14.2216ms 70.3158 Ops/s 69.3006 Ops/s $\color{#35bf28}+1.46\%$

@vmoens vmoens marked this pull request as ready for review July 27, 2023 08:42
Copy link
Contributor

@matteobettini matteobettini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants