Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Make Probabilistic modules aware of CompositeDistributions out_keys #810

Merged
merged 2 commits into from
Jun 10, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 10, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2024
@vmoens vmoens added the enhancement New feature or request label Jun 10, 2024
Copy link

github-actions bot commented Jun 10, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}17$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.1380μs 16.5130μs 60.5583 KOps/s 61.4466 KOps/s $\color{#d91a1a}-1.45\%$
test_plain_set_stack_nested 42.5200μs 17.0100μs 58.7889 KOps/s 60.6873 KOps/s $\color{#d91a1a}-3.13\%$
test_plain_set_nested_inplace 64.3800μs 18.8094μs 53.1649 KOps/s 53.3697 KOps/s $\color{#d91a1a}-0.38\%$
test_plain_set_stack_nested_inplace 48.4210μs 18.8265μs 53.1165 KOps/s 53.5219 KOps/s $\color{#d91a1a}-0.76\%$
test_items 15.4590μs 2.5262μs 395.8490 KOps/s 405.2981 KOps/s $\color{#d91a1a}-2.33\%$
test_items_nested 0.4457ms 0.2648ms 3.7766 KOps/s 3.7396 KOps/s $\color{#35bf28}+0.99\%$
test_items_nested_locked 0.8271ms 0.2664ms 3.7544 KOps/s 3.7075 KOps/s $\color{#35bf28}+1.26\%$
test_items_nested_leaf 0.1489ms 75.1403μs 13.3084 KOps/s 12.0969 KOps/s $\textbf{\color{#35bf28}+10.02\%}$
test_items_stack_nested 0.4723ms 0.2655ms 3.7665 KOps/s 3.6947 KOps/s $\color{#35bf28}+1.94\%$
test_items_stack_nested_leaf 0.1493ms 75.6126μs 13.2253 KOps/s 12.7432 KOps/s $\color{#35bf28}+3.78\%$
test_items_stack_nested_locked 0.8293ms 0.2653ms 3.7700 KOps/s 3.6648 KOps/s $\color{#35bf28}+2.87\%$
test_keys 25.3880μs 3.8161μs 262.0504 KOps/s 258.6421 KOps/s $\color{#35bf28}+1.32\%$
test_keys_nested 0.2620ms 0.1400ms 7.1427 KOps/s 7.0531 KOps/s $\color{#35bf28}+1.27\%$
test_keys_nested_locked 0.7380ms 0.1447ms 6.9128 KOps/s 6.8286 KOps/s $\color{#35bf28}+1.23\%$
test_keys_nested_leaf 0.2166ms 0.1186ms 8.4345 KOps/s 8.3408 KOps/s $\color{#35bf28}+1.12\%$
test_keys_stack_nested 0.1829ms 0.1397ms 7.1583 KOps/s 7.0326 KOps/s $\color{#35bf28}+1.79\%$
test_keys_stack_nested_leaf 0.1960ms 0.1162ms 8.6084 KOps/s 8.4388 KOps/s $\color{#35bf28}+2.01\%$
test_keys_stack_nested_locked 0.2553ms 0.1435ms 6.9693 KOps/s 6.9351 KOps/s $\color{#35bf28}+0.49\%$
test_values 14.7175μs 1.1569μs 864.3914 KOps/s 853.0868 KOps/s $\color{#35bf28}+1.33\%$
test_values_nested 86.6730μs 49.9669μs 20.0133 KOps/s 19.0518 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_values_nested_locked 0.1018ms 50.0944μs 19.9623 KOps/s 18.7886 KOps/s $\textbf{\color{#35bf28}+6.25\%}$
test_values_nested_leaf 92.8730μs 45.1750μs 22.1362 KOps/s 21.1709 KOps/s $\color{#35bf28}+4.56\%$
test_values_stack_nested 99.6180μs 49.9688μs 20.0125 KOps/s 18.7050 KOps/s $\textbf{\color{#35bf28}+6.99\%}$
test_values_stack_nested_leaf 94.4870μs 45.0989μs 22.1735 KOps/s 21.4784 KOps/s $\color{#35bf28}+3.24\%$
test_values_stack_nested_locked 99.8070μs 50.1352μs 19.9461 KOps/s 18.8157 KOps/s $\textbf{\color{#35bf28}+6.01\%}$
test_membership 46.6470μs 1.3383μs 747.1932 KOps/s 704.9734 KOps/s $\textbf{\color{#35bf28}+5.99\%}$
test_membership_nested 27.6020μs 3.3726μs 296.5059 KOps/s 288.3932 KOps/s $\color{#35bf28}+2.81\%$
test_membership_nested_leaf 37.2090μs 3.3730μs 296.4682 KOps/s 286.5807 KOps/s $\color{#35bf28}+3.45\%$
test_membership_stacked_nested 26.8010μs 3.3860μs 295.3304 KOps/s 286.8613 KOps/s $\color{#35bf28}+2.95\%$
test_membership_stacked_nested_leaf 21.3600μs 3.4234μs 292.1096 KOps/s 288.2279 KOps/s $\color{#35bf28}+1.35\%$
test_membership_nested_last 27.6620μs 4.0999μs 243.9099 KOps/s 232.5704 KOps/s $\color{#35bf28}+4.88\%$
test_membership_nested_leaf_last 39.5240μs 4.1490μs 241.0219 KOps/s 233.1797 KOps/s $\color{#35bf28}+3.36\%$
test_membership_stacked_nested_last 24.4860μs 4.1041μs 243.6592 KOps/s 71.9358 KOps/s $\textbf{\color{#35bf28}+238.72\%}$
test_membership_stacked_nested_leaf_last 39.2440μs 4.1352μs 241.8240 KOps/s 72.1863 KOps/s $\textbf{\color{#35bf28}+235.00\%}$
test_nested_getleaf 42.6800μs 10.5448μs 94.8338 KOps/s 91.5118 KOps/s $\color{#35bf28}+3.63\%$
test_nested_get 29.9850μs 9.9857μs 100.1430 KOps/s 95.9775 KOps/s $\color{#35bf28}+4.34\%$
test_stacked_getleaf 56.8160μs 10.5214μs 95.0448 KOps/s 90.6971 KOps/s $\color{#35bf28}+4.79\%$
test_stacked_get 53.2690μs 9.8174μs 101.8600 KOps/s 95.5813 KOps/s $\textbf{\color{#35bf28}+6.57\%}$
test_nested_getitemleaf 42.8800μs 11.1445μs 89.7305 KOps/s 85.4576 KOps/s $\textbf{\color{#35bf28}+5.00\%}$
test_nested_getitem 47.3890μs 10.3199μs 96.9004 KOps/s 92.0287 KOps/s $\textbf{\color{#35bf28}+5.29\%}$
test_stacked_getitemleaf 51.9870μs 11.1792μs 89.4516 KOps/s 85.7923 KOps/s $\color{#35bf28}+4.27\%$
test_stacked_getitem 46.8380μs 10.2187μs 97.8598 KOps/s 92.3591 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_lock_nested 0.7983ms 0.3504ms 2.8542 KOps/s 2.4545 KOps/s $\textbf{\color{#35bf28}+16.29\%}$
test_lock_stack_nested 0.6543ms 0.3128ms 3.1969 KOps/s 3.3452 KOps/s $\color{#d91a1a}-4.43\%$
test_unlock_nested 0.8386ms 0.3544ms 2.8220 KOps/s 2.4224 KOps/s $\textbf{\color{#35bf28}+16.49\%}$
test_unlock_stack_nested 0.4929ms 0.3201ms 3.1236 KOps/s 3.2404 KOps/s $\color{#d91a1a}-3.60\%$
test_flatten_speed 0.5564ms 98.1393μs 10.1896 KOps/s 10.3030 KOps/s $\color{#d91a1a}-1.10\%$
test_unflatten_speed 0.6323ms 0.4059ms 2.4639 KOps/s 2.4059 KOps/s $\color{#35bf28}+2.41\%$
test_common_ops 5.1499ms 0.7251ms 1.3790 KOps/s 1.4684 KOps/s $\textbf{\color{#d91a1a}-6.09\%}$
test_creation 82.7950μs 1.9155μs 522.0683 KOps/s 516.1349 KOps/s $\color{#35bf28}+1.15\%$
test_creation_empty 28.4830μs 10.2161μs 97.8844 KOps/s 108.7765 KOps/s $\textbf{\color{#d91a1a}-10.01\%}$
test_creation_nested_1 37.6800μs 13.2557μs 75.4393 KOps/s 83.8921 KOps/s $\textbf{\color{#d91a1a}-10.08\%}$
test_creation_nested_2 45.9760μs 16.2072μs 61.7008 KOps/s 65.1194 KOps/s $\textbf{\color{#d91a1a}-5.25\%}$
test_clone 0.1027ms 13.6401μs 73.3131 KOps/s 73.3052 KOps/s $\color{#35bf28}+0.01\%$
test_getitem[int] 34.1640μs 11.3258μs 88.2936 KOps/s 88.6280 KOps/s $\color{#d91a1a}-0.38\%$
test_getitem[slice_int] 81.5630μs 23.1341μs 43.2261 KOps/s 44.6454 KOps/s $\color{#d91a1a}-3.18\%$
test_getitem[range] 86.9720μs 62.4586μs 16.0106 KOps/s 16.1714 KOps/s $\color{#d91a1a}-0.99\%$
test_getitem[tuple] 54.5720μs 18.9058μs 52.8937 KOps/s 52.3834 KOps/s $\color{#35bf28}+0.97\%$
test_getitem[list] 0.1385ms 42.5019μs 23.5284 KOps/s 24.2520 KOps/s $\color{#d91a1a}-2.98\%$
test_setitem_dim[int] 65.9130μs 33.6209μs 29.7434 KOps/s 30.9744 KOps/s $\color{#d91a1a}-3.97\%$
test_setitem_dim[slice_int] 0.1068ms 60.5232μs 16.5226 KOps/s 17.0264 KOps/s $\color{#d91a1a}-2.96\%$
test_setitem_dim[range] 0.1407ms 84.1522μs 11.8832 KOps/s 12.3107 KOps/s $\color{#d91a1a}-3.47\%$
test_setitem_dim[tuple] 82.9550μs 48.9296μs 20.4375 KOps/s 20.6287 KOps/s $\color{#d91a1a}-0.93\%$
test_setitem 61.9960μs 20.2748μs 49.3224 KOps/s 52.3235 KOps/s $\textbf{\color{#d91a1a}-5.74\%}$
test_set 67.5060μs 19.7346μs 50.6724 KOps/s 54.3410 KOps/s $\textbf{\color{#d91a1a}-6.75\%}$
test_set_shared 1.7847ms 0.1449ms 6.9012 KOps/s 6.9253 KOps/s $\color{#d91a1a}-0.35\%$
test_update 0.2371ms 21.6935μs 46.0968 KOps/s 50.8614 KOps/s $\textbf{\color{#d91a1a}-9.37\%}$
test_update_nested 83.6060μs 30.0501μs 33.2778 KOps/s 35.3813 KOps/s $\textbf{\color{#d91a1a}-5.95\%}$
test_update__nested 0.1284ms 26.1109μs 38.2982 KOps/s 39.4433 KOps/s $\color{#d91a1a}-2.90\%$
test_set_nested 75.9420μs 21.5299μs 46.4471 KOps/s 47.7186 KOps/s $\color{#d91a1a}-2.66\%$
test_set_nested_new 65.6420μs 25.9735μs 38.5008 KOps/s 39.3819 KOps/s $\color{#d91a1a}-2.24\%$
test_select 92.3220μs 41.6839μs 23.9901 KOps/s 24.3693 KOps/s $\color{#d91a1a}-1.56\%$
test_select_nested 0.1269ms 61.1697μs 16.3480 KOps/s 16.2803 KOps/s $\color{#35bf28}+0.42\%$
test_exclude_nested 0.2743ms 0.1232ms 8.1158 KOps/s 8.1690 KOps/s $\color{#d91a1a}-0.65\%$
test_empty[True] 0.6442ms 0.3986ms 2.5088 KOps/s 2.4908 KOps/s $\color{#35bf28}+0.72\%$
test_empty[False] 10.6950μs 1.1696μs 854.9686 KOps/s 851.6229 KOps/s $\color{#35bf28}+0.39\%$
test_unbind_speed 0.4663ms 0.2588ms 3.8638 KOps/s 3.8871 KOps/s $\color{#d91a1a}-0.60\%$
test_unbind_speed_stack0 0.3284ms 0.2556ms 3.9117 KOps/s 4.0077 KOps/s $\color{#d91a1a}-2.40\%$
test_unbind_speed_stack1 62.2340ms 0.7615ms 1.3132 KOps/s 1.3465 KOps/s $\color{#d91a1a}-2.47\%$
test_split 69.8829ms 1.5976ms 625.9275 Ops/s 623.9293 Ops/s $\color{#35bf28}+0.32\%$
test_chunk 66.1855ms 1.5975ms 625.9799 Ops/s 625.8566 Ops/s $\color{#35bf28}+0.02\%$
test_creation[device0] 0.2176ms 84.7284μs 11.8024 KOps/s 11.9500 KOps/s $\color{#d91a1a}-1.24\%$
test_creation_from_tensor 3.8342ms 86.7721μs 11.5244 KOps/s 11.7725 KOps/s $\color{#d91a1a}-2.11\%$
test_add_one[memmap_tensor0] 62.6070μs 5.4036μs 185.0633 KOps/s 179.6166 KOps/s $\color{#35bf28}+3.03\%$
test_contiguous[memmap_tensor0] 17.4720μs 0.6452μs 1.5499 MOps/s 1.5611 MOps/s $\color{#d91a1a}-0.72\%$
test_stack[memmap_tensor0] 22.7930μs 3.5280μs 283.4464 KOps/s 263.5275 KOps/s $\textbf{\color{#35bf28}+7.56\%}$
test_memmaptd_index 0.5342ms 0.2506ms 3.9897 KOps/s 3.8045 KOps/s $\color{#35bf28}+4.87\%$
test_memmaptd_index_astensor 0.8264ms 0.3215ms 3.1103 KOps/s 2.9252 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_memmaptd_index_op 1.0879ms 0.6041ms 1.6553 KOps/s 1.6600 KOps/s $\color{#d91a1a}-0.28\%$
test_serialize_model 0.1742s 0.1166s 8.5734 Ops/s 8.3284 Ops/s $\color{#35bf28}+2.94\%$
test_serialize_model_pickle 0.4500s 0.3745s 2.6701 Ops/s 2.6179 Ops/s $\color{#35bf28}+1.99\%$
test_serialize_weights 0.1712s 0.1122s 8.9157 Ops/s 8.7345 Ops/s $\color{#35bf28}+2.08\%$
test_serialize_weights_returnearly 0.1992s 0.1336s 7.4847 Ops/s 7.6544 Ops/s $\color{#d91a1a}-2.22\%$
test_serialize_weights_pickle 1.1369s 0.6026s 1.6594 Ops/s 1.5678 Ops/s $\textbf{\color{#35bf28}+5.84\%}$
test_serialize_weights_filesystem 97.2933ms 93.4250ms 10.7038 Ops/s 10.7428 Ops/s $\color{#d91a1a}-0.36\%$
test_serialize_model_filesystem 0.1660s 0.1018s 9.8198 Ops/s 10.5050 Ops/s $\textbf{\color{#d91a1a}-6.52\%}$
test_reshape_pytree 60.9840μs 25.7550μs 38.8274 KOps/s 39.3922 KOps/s $\color{#d91a1a}-1.43\%$
test_reshape_td 77.4950μs 33.8357μs 29.5545 KOps/s 29.6315 KOps/s $\color{#d91a1a}-0.26\%$
test_view_pytree 63.5090μs 25.4383μs 39.3109 KOps/s 39.7010 KOps/s $\color{#d91a1a}-0.98\%$
test_view_td 0.1035ms 37.4399μs 26.7095 KOps/s 26.6990 KOps/s $\color{#35bf28}+0.04\%$
test_unbind_pytree 65.4520μs 29.1202μs 34.3404 KOps/s 34.0788 KOps/s $\color{#35bf28}+0.77\%$
test_unbind_td 0.4367ms 37.9221μs 26.3698 KOps/s 26.3006 KOps/s $\color{#35bf28}+0.26\%$
test_split_pytree 73.5170μs 29.2687μs 34.1662 KOps/s 34.3678 KOps/s $\color{#d91a1a}-0.59\%$
test_split_td 0.1218ms 41.3541μs 24.1814 KOps/s 24.6926 KOps/s $\color{#d91a1a}-2.07\%$
test_add_pytree 81.7420μs 34.8513μs 28.6933 KOps/s 28.7825 KOps/s $\color{#d91a1a}-0.31\%$
test_add_td 0.1140ms 53.4108μs 18.7228 KOps/s 19.5955 KOps/s $\color{#d91a1a}-4.45\%$
test_distributed 0.2114ms 0.1018ms 9.8275 KOps/s 9.7832 KOps/s $\color{#35bf28}+0.45\%$
test_tdmodule 86.6820μs 17.8452μs 56.0376 KOps/s 59.8774 KOps/s $\textbf{\color{#d91a1a}-6.41\%}$
test_tdmodule_dispatch 52.7280μs 35.1605μs 28.4410 KOps/s 31.0173 KOps/s $\textbf{\color{#d91a1a}-8.31\%}$
test_tdseq 37.8010μs 20.8934μs 47.8620 KOps/s 51.6167 KOps/s $\textbf{\color{#d91a1a}-7.27\%}$
test_tdseq_dispatch 62.6570μs 40.6284μs 24.6133 KOps/s 26.8331 KOps/s $\textbf{\color{#d91a1a}-8.27\%}$
test_instantiation_functorch 2.1522ms 1.3203ms 757.4226 Ops/s 768.6068 Ops/s $\color{#d91a1a}-1.46\%$
test_instantiation_td 1.5960ms 1.0263ms 974.3986 Ops/s 973.4070 Ops/s $\color{#35bf28}+0.10\%$
test_exec_functorch 0.3042ms 0.1632ms 6.1272 KOps/s 6.3260 KOps/s $\color{#d91a1a}-3.14\%$
test_exec_functional_call 0.2950ms 0.1516ms 6.5951 KOps/s 6.7439 KOps/s $\color{#d91a1a}-2.21\%$
test_exec_td 0.2806ms 0.1467ms 6.8148 KOps/s 6.9568 KOps/s $\color{#d91a1a}-2.04\%$
test_exec_td_decorator 1.6803ms 0.2213ms 4.5191 KOps/s 4.5642 KOps/s $\color{#d91a1a}-0.99\%$
test_vmap_mlp_speed[True-True] 0.7378ms 0.4891ms 2.0447 KOps/s 2.0912 KOps/s $\color{#d91a1a}-2.22\%$
test_vmap_mlp_speed[True-False] 0.8235ms 0.4893ms 2.0439 KOps/s 2.0867 KOps/s $\color{#d91a1a}-2.06\%$
test_vmap_mlp_speed[False-True] 0.6091ms 0.3993ms 2.5045 KOps/s 2.5495 KOps/s $\color{#d91a1a}-1.77\%$
test_vmap_mlp_speed[False-False] 0.6245ms 0.4003ms 2.4984 KOps/s 2.5556 KOps/s $\color{#d91a1a}-2.24\%$
test_vmap_mlp_speed_decorator[True-True] 1.2652ms 0.5559ms 1.7990 KOps/s 1.8260 KOps/s $\color{#d91a1a}-1.48\%$
test_vmap_mlp_speed_decorator[True-False] 0.8085ms 0.5543ms 1.8042 KOps/s 1.8302 KOps/s $\color{#d91a1a}-1.42\%$
test_vmap_mlp_speed_decorator[False-True] 0.6962ms 0.4606ms 2.1709 KOps/s 2.2005 KOps/s $\color{#d91a1a}-1.34\%$
test_vmap_mlp_speed_decorator[False-False] 0.7567ms 0.4591ms 2.1782 KOps/s 2.2048 KOps/s $\color{#d91a1a}-1.21\%$
test_to_module_speed[True] 2.3318ms 1.6806ms 595.0248 Ops/s 592.1633 Ops/s $\color{#35bf28}+0.48\%$
test_to_module_speed[False] 73.8504ms 1.7756ms 563.1978 Ops/s 604.9928 Ops/s $\textbf{\color{#d91a1a}-6.91\%}$
test_tc_init 66.6150μs 27.9041μs 35.8370 KOps/s 40.1035 KOps/s $\textbf{\color{#d91a1a}-10.64\%}$
test_tc_init_nested 0.1048ms 58.3406μs 17.1407 KOps/s 19.6779 KOps/s $\textbf{\color{#d91a1a}-12.89\%}$
test_tc_first_layer_tensor 5.6649μs 0.7283μs 1.3730 MOps/s 1.3581 MOps/s $\color{#35bf28}+1.10\%$
test_tc_first_layer_nontensor 2.6624μs 0.7003μs 1.4279 MOps/s 1.4023 MOps/s $\color{#35bf28}+1.82\%$
test_tc_second_layer_tensor 33.9240μs 1.9263μs 519.1417 KOps/s 514.5347 KOps/s $\color{#35bf28}+0.90\%$
test_tc_second_layer_nontensor 13.5353μs 1.5666μs 638.3358 KOps/s 628.8449 KOps/s $\color{#35bf28}+1.51\%$
test_unbind 87.8966ms 7.4030ms 135.0800 Ops/s 134.2033 Ops/s $\color{#35bf28}+0.65\%$
test_full_like 15.0753ms 10.5729ms 94.5811 Ops/s 93.8311 Ops/s $\color{#35bf28}+0.80\%$
test_zeros_like 12.0821ms 6.1228ms 163.3239 Ops/s 177.2748 Ops/s $\textbf{\color{#d91a1a}-7.87\%}$
test_ones_like 14.5181ms 6.2970ms 158.8059 Ops/s 165.4255 Ops/s $\color{#d91a1a}-4.00\%$
test_clone 11.1010ms 7.8760ms 126.9685 Ops/s 128.2799 Ops/s $\color{#d91a1a}-1.02\%$
test_squeeze 63.2580μs 14.6031μs 68.4787 KOps/s 71.4931 KOps/s $\color{#d91a1a}-4.22\%$
test_unsqueeze 0.1131ms 60.7139μs 16.4707 KOps/s 17.1798 KOps/s $\color{#d91a1a}-4.13\%$
test_split 0.1885ms 0.1149ms 8.7002 KOps/s 9.0985 KOps/s $\color{#d91a1a}-4.38\%$
test_permute 0.2336ms 0.1258ms 7.9502 KOps/s 8.0518 KOps/s $\color{#d91a1a}-1.26\%$
test_stack 24.0635ms 22.0695ms 45.3114 Ops/s 43.9452 Ops/s $\color{#35bf28}+3.11\%$
test_cat 27.5368ms 22.3934ms 44.6560 Ops/s 44.6061 Ops/s $\color{#35bf28}+0.11\%$

Copy link

github-actions bot commented Jun 10, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.2700μs 12.0391μs 83.0626 KOps/s 78.3372 KOps/s $\textbf{\color{#35bf28}+6.03\%}$
test_plain_set_stack_nested 34.5810μs 12.1762μs 82.1275 KOps/s 77.1521 KOps/s $\textbf{\color{#35bf28}+6.45\%}$
test_plain_set_nested_inplace 45.7210μs 13.3225μs 75.0611 KOps/s 70.9648 KOps/s $\textbf{\color{#35bf28}+5.77\%}$
test_plain_set_stack_nested_inplace 30.1810μs 13.3814μs 74.7303 KOps/s 70.6466 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_items 20.2300μs 4.7487μs 210.5825 KOps/s 212.3771 KOps/s $\color{#d91a1a}-0.85\%$
test_items_nested 0.3803ms 0.3363ms 2.9732 KOps/s 2.9515 KOps/s $\color{#35bf28}+0.73\%$
test_items_nested_locked 0.3977ms 0.3390ms 2.9501 KOps/s 2.9568 KOps/s $\color{#d91a1a}-0.23\%$
test_items_nested_leaf 0.1005ms 81.7835μs 12.2274 KOps/s 12.0543 KOps/s $\color{#35bf28}+1.44\%$
test_items_stack_nested 0.3683ms 0.3397ms 2.9438 KOps/s 2.9375 KOps/s $\color{#35bf28}+0.22\%$
test_items_stack_nested_leaf 0.1148ms 82.3465μs 12.1438 KOps/s 12.1122 KOps/s $\color{#35bf28}+0.26\%$
test_items_stack_nested_locked 0.4040ms 0.3417ms 2.9264 KOps/s 2.9327 KOps/s $\color{#d91a1a}-0.21\%$
test_keys 31.8200μs 4.3670μs 228.9916 KOps/s 229.9268 KOps/s $\color{#d91a1a}-0.41\%$
test_keys_nested 87.5510μs 66.0631μs 15.1370 KOps/s 14.9921 KOps/s $\color{#35bf28}+0.97\%$
test_keys_nested_locked 0.7169ms 70.1564μs 14.2539 KOps/s 13.9205 KOps/s $\color{#35bf28}+2.39\%$
test_keys_nested_leaf 88.3620μs 56.9889μs 17.5473 KOps/s 17.3995 KOps/s $\color{#35bf28}+0.85\%$
test_keys_stack_nested 97.5110μs 65.7371μs 15.2121 KOps/s 15.0240 KOps/s $\color{#35bf28}+1.25\%$
test_keys_stack_nested_leaf 89.7710μs 56.8221μs 17.5988 KOps/s 17.4039 KOps/s $\color{#35bf28}+1.12\%$
test_keys_stack_nested_locked 93.8220μs 69.8955μs 14.3071 KOps/s 14.1629 KOps/s $\color{#35bf28}+1.02\%$
test_values 10.0470μs 1.8080μs 553.1071 KOps/s 553.9570 KOps/s $\color{#d91a1a}-0.15\%$
test_values_nested 60.3510μs 34.6421μs 28.8666 KOps/s 28.7950 KOps/s $\color{#35bf28}+0.25\%$
test_values_nested_locked 71.2920μs 37.0784μs 26.9699 KOps/s 27.0684 KOps/s $\color{#d91a1a}-0.36\%$
test_values_nested_leaf 54.4710μs 31.1060μs 32.1481 KOps/s 32.5613 KOps/s $\color{#d91a1a}-1.27\%$
test_values_stack_nested 65.4110μs 35.5451μs 28.1332 KOps/s 27.9661 KOps/s $\color{#35bf28}+0.60\%$
test_values_stack_nested_leaf 55.3010μs 31.4268μs 31.8200 KOps/s 31.6849 KOps/s $\color{#35bf28}+0.43\%$
test_values_stack_nested_locked 59.2220μs 37.5573μs 26.6260 KOps/s 26.5123 KOps/s $\color{#35bf28}+0.43\%$
test_membership 5.5330μs 0.7317μs 1.3667 MOps/s 1.4016 MOps/s $\color{#d91a1a}-2.49\%$
test_membership_nested 18.3600μs 2.5904μs 386.0462 KOps/s 389.4359 KOps/s $\color{#d91a1a}-0.87\%$
test_membership_nested_leaf 44.3210μs 2.5604μs 390.5715 KOps/s 388.5200 KOps/s $\color{#35bf28}+0.53\%$
test_membership_stacked_nested 17.0100μs 2.5782μs 387.8616 KOps/s 393.3725 KOps/s $\color{#d91a1a}-1.40\%$
test_membership_stacked_nested_leaf 20.3810μs 2.5869μs 386.5630 KOps/s 382.6574 KOps/s $\color{#35bf28}+1.02\%$
test_membership_nested_last 25.7200μs 3.0735μs 325.3649 KOps/s 326.1120 KOps/s $\color{#d91a1a}-0.23\%$
test_membership_nested_leaf_last 36.2410μs 3.0616μs 326.6216 KOps/s 325.8112 KOps/s $\color{#35bf28}+0.25\%$
test_membership_stacked_nested_last 29.0300μs 9.8228μs 101.8037 KOps/s 102.9738 KOps/s $\color{#d91a1a}-1.14\%$
test_membership_stacked_nested_leaf_last 44.0200μs 9.7808μs 102.2410 KOps/s 103.0772 KOps/s $\color{#d91a1a}-0.81\%$
test_nested_getleaf 23.5100μs 8.4042μs 118.9886 KOps/s 119.9635 KOps/s $\color{#d91a1a}-0.81\%$
test_nested_get 35.7010μs 7.8448μs 127.4733 KOps/s 127.4001 KOps/s $\color{#35bf28}+0.06\%$
test_stacked_getleaf 39.2310μs 8.4905μs 117.7791 KOps/s 119.0953 KOps/s $\color{#d91a1a}-1.11\%$
test_stacked_get 21.8310μs 7.9359μs 126.0099 KOps/s 126.5582 KOps/s $\color{#d91a1a}-0.43\%$
test_nested_getitemleaf 36.7800μs 8.5741μs 116.6304 KOps/s 117.2919 KOps/s $\color{#d91a1a}-0.56\%$
test_nested_getitem 32.5300μs 8.0555μs 124.1384 KOps/s 124.7361 KOps/s $\color{#d91a1a}-0.48\%$
test_stacked_getitemleaf 23.8800μs 8.6311μs 115.8602 KOps/s 116.2645 KOps/s $\color{#d91a1a}-0.35\%$
test_stacked_getitem 52.6410μs 8.0746μs 123.8446 KOps/s 124.0147 KOps/s $\color{#d91a1a}-0.14\%$
test_lock_nested 57.0598ms 0.4098ms 2.4402 KOps/s 2.4159 KOps/s $\color{#35bf28}+1.01\%$
test_lock_stack_nested 0.3670ms 0.2992ms 3.3423 KOps/s 3.3427 KOps/s $\color{#d91a1a}-0.01\%$
test_unlock_nested 0.7061ms 0.3503ms 2.8551 KOps/s 2.8403 KOps/s $\color{#35bf28}+0.52\%$
test_unlock_stack_nested 0.3416ms 0.3068ms 3.2599 KOps/s 3.2730 KOps/s $\color{#d91a1a}-0.40\%$
test_flatten_speed 0.1832ms 0.1006ms 9.9446 KOps/s 9.7845 KOps/s $\color{#35bf28}+1.64\%$
test_unflatten_speed 0.3206ms 0.2911ms 3.4352 KOps/s 3.4215 KOps/s $\color{#35bf28}+0.40\%$
test_common_ops 1.1111ms 0.5718ms 1.7489 KOps/s 1.7155 KOps/s $\color{#35bf28}+1.94\%$
test_creation 16.3700μs 1.6361μs 611.1976 KOps/s 610.4755 KOps/s $\color{#35bf28}+0.12\%$
test_creation_empty 21.2400μs 7.2534μs 137.8661 KOps/s 116.9158 KOps/s $\textbf{\color{#35bf28}+17.92\%}$
test_creation_nested_1 41.2900μs 8.9916μs 111.2144 KOps/s 96.5363 KOps/s $\textbf{\color{#35bf28}+15.20\%}$
test_creation_nested_2 31.3010μs 11.2566μs 88.8367 KOps/s 80.5031 KOps/s $\textbf{\color{#35bf28}+10.35\%}$
test_clone 78.2720μs 12.1889μs 82.0417 KOps/s 84.5764 KOps/s $\color{#d91a1a}-3.00\%$
test_getitem[int] 51.7510μs 11.1589μs 89.6143 KOps/s 92.2519 KOps/s $\color{#d91a1a}-2.86\%$
test_getitem[slice_int] 50.3910μs 21.2438μs 47.0726 KOps/s 48.3260 KOps/s $\color{#d91a1a}-2.59\%$
test_getitem[range] 66.6410μs 48.4268μs 20.6497 KOps/s 20.8230 KOps/s $\color{#d91a1a}-0.83\%$
test_getitem[tuple] 40.5510μs 18.9108μs 52.8799 KOps/s 54.3342 KOps/s $\color{#d91a1a}-2.68\%$
test_getitem[list] 0.1248ms 34.0219μs 29.3928 KOps/s 28.9048 KOps/s $\color{#35bf28}+1.69\%$
test_setitem_dim[int] 44.5600μs 27.7020μs 36.0985 KOps/s 34.9191 KOps/s $\color{#35bf28}+3.38\%$
test_setitem_dim[slice_int] 65.5720μs 48.3279μs 20.6920 KOps/s 20.5730 KOps/s $\color{#35bf28}+0.58\%$
test_setitem_dim[range] 89.5610μs 64.7611μs 15.4414 KOps/s 15.0400 KOps/s $\color{#35bf28}+2.67\%$
test_setitem_dim[tuple] 59.0510μs 41.8129μs 23.9161 KOps/s 23.5234 KOps/s $\color{#35bf28}+1.67\%$
test_setitem 47.8910μs 16.4505μs 60.7886 KOps/s 59.9188 KOps/s $\color{#35bf28}+1.45\%$
test_set 42.1100μs 15.9374μs 62.7454 KOps/s 61.2594 KOps/s $\color{#35bf28}+2.43\%$
test_set_shared 1.2199ms 99.4872μs 10.0515 KOps/s 10.2035 KOps/s $\color{#d91a1a}-1.49\%$
test_update 78.4420μs 17.5919μs 56.8442 KOps/s 54.3028 KOps/s $\color{#35bf28}+4.68\%$
test_update_nested 60.9210μs 22.9751μs 43.5255 KOps/s 41.6262 KOps/s $\color{#35bf28}+4.56\%$
test_update__nested 67.5610μs 23.5190μs 42.5188 KOps/s 44.4126 KOps/s $\color{#d91a1a}-4.26\%$
test_set_nested 58.0010μs 17.2125μs 58.0974 KOps/s 57.1902 KOps/s $\color{#35bf28}+1.59\%$
test_set_nested_new 66.2410μs 20.1124μs 49.7206 KOps/s 48.8939 KOps/s $\color{#35bf28}+1.69\%$
test_select 67.0420μs 33.3828μs 29.9555 KOps/s 30.5164 KOps/s $\color{#d91a1a}-1.84\%$
test_select_nested 0.3564ms 53.8626μs 18.5657 KOps/s 18.6521 KOps/s $\color{#d91a1a}-0.46\%$
test_exclude_nested 0.1566ms 0.1081ms 9.2485 KOps/s 9.1021 KOps/s $\color{#35bf28}+1.61\%$
test_empty[True] 0.4032ms 0.3470ms 2.8820 KOps/s 2.8179 KOps/s $\color{#35bf28}+2.27\%$
test_empty[False] 2.9861μs 0.9361μs 1.0682 MOps/s 1.0678 MOps/s $\color{#35bf28}+0.04\%$
test_to 0.1028ms 78.3668μs 12.7605 KOps/s 12.9889 KOps/s $\color{#d91a1a}-1.76\%$
test_to_nonblocking 0.1031ms 65.7624μs 15.2063 KOps/s 16.0081 KOps/s $\textbf{\color{#d91a1a}-5.01\%}$
test_unbind_speed 0.3075ms 0.2702ms 3.7010 KOps/s 3.8113 KOps/s $\color{#d91a1a}-2.89\%$
test_unbind_speed_stack0 0.3116ms 0.2656ms 3.7650 KOps/s 3.8435 KOps/s $\color{#d91a1a}-2.04\%$
test_unbind_speed_stack1 74.2464ms 0.7869ms 1.2708 KOps/s 1.2033 KOps/s $\textbf{\color{#35bf28}+5.61\%}$
test_split 74.5212ms 1.7142ms 583.3460 Ops/s 649.9604 Ops/s $\textbf{\color{#d91a1a}-10.25\%}$
test_chunk 74.4782ms 1.7077ms 585.5731 Ops/s 606.9369 Ops/s $\color{#d91a1a}-3.52\%$
test_creation[device0] 0.1289ms 58.3322μs 17.1432 KOps/s 16.6799 KOps/s $\color{#35bf28}+2.78\%$
test_creation_from_tensor 0.1286ms 54.1558μs 18.4653 KOps/s 17.5573 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_add_one[memmap_tensor0] 78.3120μs 7.3007μs 136.9728 KOps/s 141.1532 KOps/s $\color{#d91a1a}-2.96\%$
test_contiguous[memmap_tensor0] 10.0000μs 0.6674μs 1.4984 MOps/s 1.4952 MOps/s $\color{#35bf28}+0.22\%$
test_stack[memmap_tensor0] 23.7000μs 4.9807μs 200.7737 KOps/s 213.3801 KOps/s $\textbf{\color{#d91a1a}-5.91\%}$
test_memmaptd_index 1.1233ms 0.3011ms 3.3215 KOps/s 3.5035 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_memmaptd_index_astensor 0.6996ms 0.3665ms 2.7281 KOps/s 2.8150 KOps/s $\color{#d91a1a}-3.09\%$
test_memmaptd_index_op 1.0554ms 0.6554ms 1.5257 KOps/s 1.5280 KOps/s $\color{#d91a1a}-0.15\%$
test_serialize_model 0.1821s 0.1103s 9.0678 Ops/s 8.7371 Ops/s $\color{#35bf28}+3.78\%$
test_serialize_model_pickle 1.3660s 1.2385s 0.8074 Ops/s 0.8064 Ops/s $\color{#35bf28}+0.12\%$
test_serialize_weights 0.1821s 0.1092s 9.1534 Ops/s 8.7938 Ops/s $\color{#35bf28}+4.09\%$
test_serialize_weights_returnearly 0.2314s 99.5563ms 10.0446 Ops/s 10.5650 Ops/s $\color{#d91a1a}-4.93\%$
test_serialize_weights_pickle 1.3523s 1.2487s 0.8009 Ops/s 0.8092 Ops/s $\color{#d91a1a}-1.03\%$
test_reshape_pytree 56.1010μs 26.4416μs 37.8192 KOps/s 38.7586 KOps/s $\color{#d91a1a}-2.42\%$
test_reshape_td 49.1100μs 31.2728μs 31.9767 KOps/s 32.8428 KOps/s $\color{#d91a1a}-2.64\%$
test_view_pytree 0.2006ms 26.6597μs 37.5098 KOps/s 39.3251 KOps/s $\color{#d91a1a}-4.62\%$
test_view_td 0.1913ms 36.8423μs 27.1427 KOps/s 28.8233 KOps/s $\textbf{\color{#d91a1a}-5.83\%}$
test_unbind_pytree 50.2510μs 32.3992μs 30.8649 KOps/s 31.5733 KOps/s $\color{#d91a1a}-2.24\%$
test_unbind_td 0.4811ms 40.5410μs 24.6664 KOps/s 24.9137 KOps/s $\color{#d91a1a}-0.99\%$
test_split_pytree 65.3810μs 34.8437μs 28.6996 KOps/s 28.3815 KOps/s $\color{#35bf28}+1.12\%$
test_split_td 0.1080ms 39.5121μs 25.3087 KOps/s 25.0143 KOps/s $\color{#35bf28}+1.18\%$
test_add_pytree 88.1320μs 38.9990μs 25.6417 KOps/s 26.1347 KOps/s $\color{#d91a1a}-1.89\%$
test_add_td 77.6510μs 47.0061μs 21.2739 KOps/s 19.9006 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_distributed 4.2934ms 93.7447μs 10.6673 KOps/s 15.2065 KOps/s $\textbf{\color{#d91a1a}-29.85\%}$
test_tdmodule 29.2910μs 14.1890μs 70.4770 KOps/s 67.4169 KOps/s $\color{#35bf28}+4.54\%$
test_tdmodule_dispatch 51.0310μs 27.6316μs 36.1904 KOps/s 35.0034 KOps/s $\color{#35bf28}+3.39\%$
test_tdseq 31.4000μs 15.8633μs 63.0384 KOps/s 60.7990 KOps/s $\color{#35bf28}+3.68\%$
test_tdseq_dispatch 55.5910μs 30.6585μs 32.6174 KOps/s 31.2106 KOps/s $\color{#35bf28}+4.51\%$
test_instantiation_functorch 1.6362ms 1.5389ms 649.8321 Ops/s 659.8251 Ops/s $\color{#d91a1a}-1.51\%$
test_instantiation_td 1.5159ms 1.0609ms 942.5837 Ops/s 887.2383 Ops/s $\textbf{\color{#35bf28}+6.24\%}$
test_exec_functorch 0.1839ms 0.1545ms 6.4742 KOps/s 6.5566 KOps/s $\color{#d91a1a}-1.26\%$
test_exec_functional_call 0.1747ms 0.1419ms 7.0469 KOps/s 7.1517 KOps/s $\color{#d91a1a}-1.47\%$
test_exec_td 0.1715ms 0.1400ms 7.1424 KOps/s 7.3046 KOps/s $\color{#d91a1a}-2.22\%$
test_exec_td_decorator 0.6372ms 0.2135ms 4.6848 KOps/s 4.7700 KOps/s $\color{#d91a1a}-1.79\%$
test_vmap_mlp_speed[True-True] 0.7432ms 0.6087ms 1.6429 KOps/s 1.6314 KOps/s $\color{#35bf28}+0.70\%$
test_vmap_mlp_speed[True-False] 0.6751ms 0.6073ms 1.6468 KOps/s 1.6300 KOps/s $\color{#35bf28}+1.03\%$
test_vmap_mlp_speed[False-True] 0.5970ms 0.5421ms 1.8445 KOps/s 1.8095 KOps/s $\color{#35bf28}+1.94\%$
test_vmap_mlp_speed[False-False] 0.6768ms 0.5426ms 1.8429 KOps/s 1.7990 KOps/s $\color{#35bf28}+2.44\%$
test_vmap_mlp_speed_decorator[True-True] 0.7746ms 0.6756ms 1.4801 KOps/s 1.4704 KOps/s $\color{#35bf28}+0.66\%$
test_vmap_mlp_speed_decorator[True-False] 0.8986ms 0.6744ms 1.4828 KOps/s 1.4501 KOps/s $\color{#35bf28}+2.25\%$
test_vmap_mlp_speed_decorator[False-True] 0.7444ms 0.6013ms 1.6629 KOps/s 1.6681 KOps/s $\color{#d91a1a}-0.31\%$
test_vmap_mlp_speed_decorator[False-False] 79.9373ms 0.6560ms 1.5245 KOps/s 1.6345 KOps/s $\textbf{\color{#d91a1a}-6.73\%}$
test_vmap_transformer_speed[True-True] 8.9584ms 8.4779ms 117.9532 Ops/s 123.4360 Ops/s $\color{#d91a1a}-4.44\%$
test_vmap_transformer_speed[True-False] 8.7029ms 8.3499ms 119.7622 Ops/s 123.6563 Ops/s $\color{#d91a1a}-3.15\%$
test_vmap_transformer_speed[False-True] 8.9197ms 8.2647ms 120.9972 Ops/s 119.3556 Ops/s $\color{#35bf28}+1.38\%$
test_vmap_transformer_speed[False-False] 8.8377ms 8.2540ms 121.1532 Ops/s 122.1727 Ops/s $\color{#d91a1a}-0.83\%$
test_vmap_transformer_speed_decorator[True-True] 20.5859ms 20.2115ms 49.4768 Ops/s 49.6360 Ops/s $\color{#d91a1a}-0.32\%$
test_vmap_transformer_speed_decorator[True-False] 20.7406ms 20.2373ms 49.4136 Ops/s 49.6267 Ops/s $\color{#d91a1a}-0.43\%$
test_vmap_transformer_speed_decorator[False-True] 20.6129ms 20.1753ms 49.5656 Ops/s 49.7785 Ops/s $\color{#d91a1a}-0.43\%$
test_vmap_transformer_speed_decorator[False-False] 20.5063ms 20.1115ms 49.7228 Ops/s 49.7746 Ops/s $\color{#d91a1a}-0.10\%$
test_to_module_speed[True] 1.8746ms 1.5773ms 634.0037 Ops/s 644.5820 Ops/s $\color{#d91a1a}-1.64\%$
test_to_module_speed[False] 2.5774ms 1.5520ms 644.3357 Ops/s 650.2891 Ops/s $\color{#d91a1a}-0.92\%$
test_tc_init 0.1420ms 21.4864μs 46.5411 KOps/s 39.9846 KOps/s $\textbf{\color{#35bf28}+16.40\%}$
test_tc_init_nested 0.1621ms 42.6254μs 23.4602 KOps/s 17.9348 KOps/s $\textbf{\color{#35bf28}+30.81\%}$
test_tc_first_layer_tensor 3.6525μs 0.3632μs 2.7536 MOps/s 2.7545 MOps/s $\color{#d91a1a}-0.03\%$
test_tc_first_layer_nontensor 9.5825μs 0.3904μs 2.5613 MOps/s 2.4954 MOps/s $\color{#35bf28}+2.64\%$
test_tc_second_layer_tensor 24.7644μs 0.9717μs 1.0292 MOps/s 1.0294 MOps/s $\color{#d91a1a}-0.02\%$
test_tc_second_layer_nontensor 8.0196μs 0.8039μs 1.2440 MOps/s 1.2013 MOps/s $\color{#35bf28}+3.56\%$
test_unbind 90.0991ms 6.6606ms 150.1377 Ops/s 123.2926 Ops/s $\textbf{\color{#35bf28}+21.77\%}$
test_full_like 10.7673ms 9.1654ms 109.1061 Ops/s 75.3860 Ops/s $\textbf{\color{#35bf28}+44.73\%}$
test_zeros_like 8.3750ms 7.8426ms 127.5082 Ops/s 125.3350 Ops/s $\color{#35bf28}+1.73\%$
test_ones_like 8.3842ms 7.8548ms 127.3099 Ops/s 127.1169 Ops/s $\color{#35bf28}+0.15\%$
test_clone 10.3183ms 9.2167ms 108.4989 Ops/s 108.5329 Ops/s $\color{#d91a1a}-0.03\%$
test_squeeze 76.6010μs 11.8252μs 84.5655 KOps/s 90.5967 KOps/s $\textbf{\color{#d91a1a}-6.66\%}$
test_unsqueeze 0.1993ms 53.8117μs 18.5833 KOps/s 19.2708 KOps/s $\color{#d91a1a}-3.57\%$
test_split 0.1456ms 0.1003ms 9.9735 KOps/s 10.2501 KOps/s $\color{#d91a1a}-2.70\%$
test_permute 0.1466ms 0.1094ms 9.1376 KOps/s 9.1780 KOps/s $\color{#d91a1a}-0.44\%$
test_stack 26.7822ms 26.5861ms 37.6136 Ops/s 37.4684 Ops/s $\color{#35bf28}+0.39\%$
test_cat 28.0730ms 26.8448ms 37.2511 Ops/s 37.6698 Ops/s $\color{#d91a1a}-1.11\%$

@vmoens vmoens merged commit d852844 into main Jun 10, 2024
11 of 17 checks passed
@vmoens vmoens deleted the proba-module-default-outkeys branch June 10, 2024 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants