Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix fallback of deterministic samples when mean is not available #828

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 24, 2024

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2024
@vmoens vmoens added the bug Something isn't working label Jun 24, 2024
@vmoens vmoens merged commit 266ee51 into main Jun 24, 2024
25 of 30 checks passed
@vmoens vmoens deleted the flexible-fallback-dist branch June 24, 2024 09:54
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}20$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 43.3320μs 16.8880μs 59.2136 KOps/s 61.7310 KOps/s $\color{#d91a1a}-4.08\%$
test_plain_set_stack_nested 49.3120μs 17.1902μs 58.1726 KOps/s 60.9491 KOps/s $\color{#d91a1a}-4.56\%$
test_plain_set_nested_inplace 45.1540μs 19.3227μs 51.7527 KOps/s 54.0312 KOps/s $\color{#d91a1a}-4.22\%$
test_plain_set_stack_nested_inplace 53.3400μs 19.2107μs 52.0542 KOps/s 53.9587 KOps/s $\color{#d91a1a}-3.53\%$
test_items 26.3390μs 2.5177μs 397.1927 KOps/s 404.3023 KOps/s $\color{#d91a1a}-1.76\%$
test_items_nested 0.4549ms 0.2667ms 3.7493 KOps/s 3.7299 KOps/s $\color{#35bf28}+0.52\%$
test_items_nested_locked 0.4414ms 0.2680ms 3.7310 KOps/s 3.7991 KOps/s $\color{#d91a1a}-1.79\%$
test_items_nested_leaf 0.5234ms 77.2942μs 12.9376 KOps/s 12.7682 KOps/s $\color{#35bf28}+1.33\%$
test_items_stack_nested 0.3397ms 0.2666ms 3.7511 KOps/s 3.7581 KOps/s $\color{#d91a1a}-0.19\%$
test_items_stack_nested_leaf 0.1597ms 79.7771μs 12.5349 KOps/s 12.5586 KOps/s $\color{#d91a1a}-0.19\%$
test_items_stack_nested_locked 0.4549ms 0.2668ms 3.7484 KOps/s 3.7851 KOps/s $\color{#d91a1a}-0.97\%$
test_keys 26.1090μs 3.8433μs 260.1914 KOps/s 236.4829 KOps/s $\textbf{\color{#35bf28}+10.03\%}$
test_keys_nested 0.1952ms 0.1409ms 7.0974 KOps/s 7.2576 KOps/s $\color{#d91a1a}-2.21\%$
test_keys_nested_locked 0.6577ms 0.1442ms 6.9332 KOps/s 7.0021 KOps/s $\color{#d91a1a}-0.98\%$
test_keys_nested_leaf 0.2362ms 0.1179ms 8.4813 KOps/s 8.5909 KOps/s $\color{#d91a1a}-1.28\%$
test_keys_stack_nested 0.2452ms 0.1382ms 7.2346 KOps/s 7.3373 KOps/s $\color{#d91a1a}-1.40\%$
test_keys_stack_nested_leaf 0.2363ms 0.1175ms 8.5134 KOps/s 8.6649 KOps/s $\color{#d91a1a}-1.75\%$
test_keys_stack_nested_locked 0.2572ms 0.1440ms 6.9444 KOps/s 7.0939 KOps/s $\color{#d91a1a}-2.11\%$
test_values 5.2598μs 1.1439μs 874.2311 KOps/s 859.4403 KOps/s $\color{#35bf28}+1.72\%$
test_values_nested 0.1053ms 51.0710μs 19.5806 KOps/s 19.7688 KOps/s $\color{#d91a1a}-0.95\%$
test_values_nested_locked 93.9250μs 51.6208μs 19.3720 KOps/s 19.8082 KOps/s $\color{#d91a1a}-2.20\%$
test_values_nested_leaf 95.5980μs 46.3606μs 21.5700 KOps/s 22.0545 KOps/s $\color{#d91a1a}-2.20\%$
test_values_stack_nested 93.0740μs 52.0302μs 19.2196 KOps/s 19.4897 KOps/s $\color{#d91a1a}-1.39\%$
test_values_stack_nested_leaf 83.8260μs 46.1263μs 21.6796 KOps/s 22.0963 KOps/s $\color{#d91a1a}-1.89\%$
test_values_stack_nested_locked 0.1054ms 51.4126μs 19.4505 KOps/s 19.5227 KOps/s $\color{#d91a1a}-0.37\%$
test_membership 23.7950μs 1.3667μs 731.6833 KOps/s 743.7502 KOps/s $\color{#d91a1a}-1.62\%$
test_membership_nested 26.2700μs 3.4836μs 287.0610 KOps/s 296.2601 KOps/s $\color{#d91a1a}-3.11\%$
test_membership_nested_leaf 23.1530μs 3.5134μs 284.6241 KOps/s 288.8604 KOps/s $\color{#d91a1a}-1.47\%$
test_membership_stacked_nested 43.6710μs 3.4416μs 290.5616 KOps/s 289.1804 KOps/s $\color{#35bf28}+0.48\%$
test_membership_stacked_nested_leaf 26.9200μs 3.4418μs 290.5475 KOps/s 293.7959 KOps/s $\color{#d91a1a}-1.11\%$
test_membership_nested_last 28.4330μs 4.2348μs 236.1370 KOps/s 239.8423 KOps/s $\color{#d91a1a}-1.54\%$
test_membership_nested_leaf_last 28.3620μs 4.2057μs 237.7734 KOps/s 242.3559 KOps/s $\color{#d91a1a}-1.89\%$
test_membership_stacked_nested_last 25.8280μs 4.1627μs 240.2301 KOps/s 243.2958 KOps/s $\color{#d91a1a}-1.26\%$
test_membership_stacked_nested_leaf_last 22.0410μs 4.2331μs 236.2333 KOps/s 243.0671 KOps/s $\color{#d91a1a}-2.81\%$
test_nested_getleaf 33.2820μs 10.6663μs 93.7529 KOps/s 94.3614 KOps/s $\color{#d91a1a}-0.64\%$
test_nested_get 37.3600μs 10.1168μs 98.8456 KOps/s 99.0115 KOps/s $\color{#d91a1a}-0.17\%$
test_stacked_getleaf 32.2910μs 10.6237μs 94.1287 KOps/s 94.5074 KOps/s $\color{#d91a1a}-0.40\%$
test_stacked_get 27.0410μs 10.1081μs 98.9307 KOps/s 99.9629 KOps/s $\color{#d91a1a}-1.03\%$
test_nested_getitemleaf 41.7180μs 11.2573μs 88.8314 KOps/s 89.9873 KOps/s $\color{#d91a1a}-1.28\%$
test_nested_getitem 37.0590μs 10.4459μs 95.7311 KOps/s 98.1733 KOps/s $\color{#d91a1a}-2.49\%$
test_stacked_getitemleaf 33.5630μs 10.9226μs 91.5531 KOps/s 91.4693 KOps/s $\color{#35bf28}+0.09\%$
test_stacked_getitem 32.5510μs 10.2947μs 97.1378 KOps/s 99.7699 KOps/s $\color{#d91a1a}-2.64\%$
test_lock_nested 50.5992ms 0.3851ms 2.5965 KOps/s 2.9878 KOps/s $\textbf{\color{#d91a1a}-13.10\%}$
test_lock_stack_nested 0.4582ms 0.3081ms 3.2457 KOps/s 3.2816 KOps/s $\color{#d91a1a}-1.10\%$
test_unlock_nested 0.6679ms 0.3412ms 2.9305 KOps/s 2.9560 KOps/s $\color{#d91a1a}-0.86\%$
test_unlock_stack_nested 0.3895ms 0.3151ms 3.1736 KOps/s 3.2152 KOps/s $\color{#d91a1a}-1.30\%$
test_flatten_speed 0.2039ms 97.6823μs 10.2373 KOps/s 10.3427 KOps/s $\color{#d91a1a}-1.02\%$
test_unflatten_speed 0.8941ms 0.4077ms 2.4531 KOps/s 2.4485 KOps/s $\color{#35bf28}+0.18\%$
test_common_ops 1.3982ms 0.7149ms 1.3987 KOps/s 1.4576 KOps/s $\color{#d91a1a}-4.04\%$
test_creation 53.5400μs 1.9207μs 520.6559 KOps/s 524.6444 KOps/s $\color{#d91a1a}-0.76\%$
test_creation_empty 30.0660μs 10.8850μs 91.8697 KOps/s 108.2101 KOps/s $\textbf{\color{#d91a1a}-15.10\%}$
test_creation_nested_1 37.1900μs 13.6290μs 73.3729 KOps/s 83.4236 KOps/s $\textbf{\color{#d91a1a}-12.05\%}$
test_creation_nested_2 52.4980μs 17.0330μs 58.7096 KOps/s 65.2696 KOps/s $\textbf{\color{#d91a1a}-10.05\%}$
test_clone 0.1355ms 13.3762μs 74.7596 KOps/s 73.8693 KOps/s $\color{#35bf28}+1.21\%$
test_getitem[int] 32.0500μs 11.6284μs 85.9964 KOps/s 89.5574 KOps/s $\color{#d91a1a}-3.98\%$
test_getitem[slice_int] 52.1570μs 22.9884μs 43.5002 KOps/s 43.0772 KOps/s $\color{#35bf28}+0.98\%$
test_getitem[range] 78.2960μs 58.0484μs 17.2270 KOps/s 17.2765 KOps/s $\color{#d91a1a}-0.29\%$
test_getitem[tuple] 48.0300μs 19.3011μs 51.8104 KOps/s 52.2329 KOps/s $\color{#d91a1a}-0.81\%$
test_getitem[list] 94.3360μs 41.2378μs 24.2496 KOps/s 23.5893 KOps/s $\color{#35bf28}+2.80\%$
test_setitem_dim[int] 64.3900μs 34.4367μs 29.0388 KOps/s 30.9328 KOps/s $\textbf{\color{#d91a1a}-6.12\%}$
test_setitem_dim[slice_int] 0.1162ms 61.6192μs 16.2287 KOps/s 16.5991 KOps/s $\color{#d91a1a}-2.23\%$
test_setitem_dim[range] 0.1826ms 83.1478μs 12.0268 KOps/s 12.2439 KOps/s $\color{#d91a1a}-1.77\%$
test_setitem_dim[tuple] 79.9700μs 49.6820μs 20.1280 KOps/s 18.9355 KOps/s $\textbf{\color{#35bf28}+6.30\%}$
test_setitem 61.8060μs 20.4424μs 48.9180 KOps/s 52.1215 KOps/s $\textbf{\color{#d91a1a}-6.15\%}$
test_set 60.5830μs 19.8191μs 50.4563 KOps/s 53.6190 KOps/s $\textbf{\color{#d91a1a}-5.90\%}$
test_set_shared 3.0934ms 0.1423ms 7.0282 KOps/s 6.9323 KOps/s $\color{#35bf28}+1.38\%$
test_update 0.1084ms 22.6751μs 44.1013 KOps/s 48.7001 KOps/s $\textbf{\color{#d91a1a}-9.44\%}$
test_update_nested 74.7890μs 32.1820μs 31.0733 KOps/s 34.2786 KOps/s $\textbf{\color{#d91a1a}-9.35\%}$
test_update__nested 65.6830μs 25.2059μs 39.6732 KOps/s 41.1609 KOps/s $\color{#d91a1a}-3.61\%$
test_set_nested 63.2380μs 21.9253μs 45.6094 KOps/s 48.9565 KOps/s $\textbf{\color{#d91a1a}-6.84\%}$
test_set_nested_new 68.0070μs 26.3117μs 38.0060 KOps/s 40.4202 KOps/s $\textbf{\color{#d91a1a}-5.97\%}$
test_select 95.6290μs 41.7144μs 23.9725 KOps/s 25.1543 KOps/s $\color{#d91a1a}-4.70\%$
test_select_nested 0.1210ms 60.6733μs 16.4817 KOps/s 15.9418 KOps/s $\color{#35bf28}+3.39\%$
test_exclude_nested 0.2362ms 0.1190ms 8.4007 KOps/s 8.2328 KOps/s $\color{#35bf28}+2.04\%$
test_empty[True] 0.7305ms 0.3970ms 2.5188 KOps/s 2.5368 KOps/s $\color{#d91a1a}-0.71\%$
test_empty[False] 9.5102μs 1.1710μs 853.9653 KOps/s 867.1926 KOps/s $\color{#d91a1a}-1.53\%$
test_unbind_speed 1.5418ms 0.2484ms 4.0251 KOps/s 3.9798 KOps/s $\color{#35bf28}+1.14\%$
test_unbind_speed_stack0 0.4629ms 0.2499ms 4.0018 KOps/s 4.0273 KOps/s $\color{#d91a1a}-0.63\%$
test_unbind_speed_stack1 65.5690ms 0.7141ms 1.4003 KOps/s 1.3881 KOps/s $\color{#35bf28}+0.88\%$
test_split 63.6247ms 1.5822ms 632.0140 Ops/s 623.2386 Ops/s $\color{#35bf28}+1.41\%$
test_chunk 66.5061ms 1.5849ms 630.9426 Ops/s 622.7709 Ops/s $\color{#35bf28}+1.31\%$
test_creation[device0] 0.1566ms 83.7216μs 11.9443 KOps/s 11.6014 KOps/s $\color{#35bf28}+2.96\%$
test_creation_from_tensor 3.5782ms 86.8230μs 11.5177 KOps/s 11.7532 KOps/s $\color{#d91a1a}-2.00\%$
test_add_one[memmap_tensor0] 53.5090μs 5.3506μs 186.8933 KOps/s 183.5739 KOps/s $\color{#35bf28}+1.81\%$
test_contiguous[memmap_tensor0] 13.5150μs 0.6416μs 1.5586 MOps/s 1.5120 MOps/s $\color{#35bf28}+3.08\%$
test_stack[memmap_tensor0] 23.5640μs 3.5342μs 282.9480 KOps/s 282.0554 KOps/s $\color{#35bf28}+0.32\%$
test_memmaptd_index 0.9364ms 0.2526ms 3.9593 KOps/s 3.8844 KOps/s $\color{#35bf28}+1.93\%$
test_memmaptd_index_astensor 0.8060ms 0.3259ms 3.0686 KOps/s 2.9995 KOps/s $\color{#35bf28}+2.31\%$
test_memmaptd_index_op 0.9845ms 0.6100ms 1.6394 KOps/s 1.6877 KOps/s $\color{#d91a1a}-2.86\%$
test_serialize_model 0.1736s 0.1133s 8.8294 Ops/s 8.4829 Ops/s $\color{#35bf28}+4.09\%$
test_serialize_model_pickle 0.4463s 0.3746s 2.6692 Ops/s 2.6118 Ops/s $\color{#35bf28}+2.20\%$
test_serialize_weights 0.1641s 0.1113s 8.9838 Ops/s 8.8806 Ops/s $\color{#35bf28}+1.16\%$
test_serialize_weights_returnearly 0.2133s 0.1441s 6.9407 Ops/s 7.1306 Ops/s $\color{#d91a1a}-2.66\%$
test_serialize_weights_pickle 1.1026s 0.6009s 1.6642 Ops/s 2.5661 Ops/s $\textbf{\color{#d91a1a}-35.15\%}$
test_serialize_weights_filesystem 0.1595s 97.9805ms 10.2061 Ops/s 10.7996 Ops/s $\textbf{\color{#d91a1a}-5.50\%}$
test_serialize_model_filesystem 0.1152s 94.4619ms 10.5863 Ops/s 9.7083 Ops/s $\textbf{\color{#35bf28}+9.04\%}$
test_reshape_pytree 52.1980μs 25.7139μs 38.8894 KOps/s 39.2047 KOps/s $\color{#d91a1a}-0.80\%$
test_reshape_td 85.5200μs 34.3564μs 29.1066 KOps/s 29.4031 KOps/s $\color{#d91a1a}-1.01\%$
test_view_pytree 57.1660μs 25.8169μs 38.7344 KOps/s 39.0963 KOps/s $\color{#d91a1a}-0.93\%$
test_view_td 78.2360μs 39.4132μs 25.3722 KOps/s 26.6983 KOps/s $\color{#d91a1a}-4.97\%$
test_unbind_pytree 75.8420μs 29.4426μs 33.9644 KOps/s 34.3764 KOps/s $\color{#d91a1a}-1.20\%$
test_unbind_td 66.8469ms 43.5613μs 22.9562 KOps/s 27.0132 KOps/s $\textbf{\color{#d91a1a}-15.02\%}$
test_split_pytree 63.9400μs 29.3334μs 34.0908 KOps/s 34.7282 KOps/s $\color{#d91a1a}-1.84\%$
test_split_td 0.1180ms 41.9655μs 23.8291 KOps/s 24.3701 KOps/s $\color{#d91a1a}-2.22\%$
test_add_pytree 74.9400μs 35.1606μs 28.4409 KOps/s 28.8717 KOps/s $\color{#d91a1a}-1.49\%$
test_add_td 0.1421ms 56.4770μs 17.7063 KOps/s 19.4325 KOps/s $\textbf{\color{#d91a1a}-8.88\%}$
test_distributed 0.1951ms 0.1027ms 9.7364 KOps/s 9.7832 KOps/s $\color{#d91a1a}-0.48\%$
test_tdmodule 39.0530μs 17.5430μs 57.0029 KOps/s 59.4747 KOps/s $\color{#d91a1a}-4.16\%$
test_tdmodule_dispatch 55.9140μs 35.0172μs 28.5574 KOps/s 30.4243 KOps/s $\textbf{\color{#d91a1a}-6.14\%}$
test_tdseq 34.7240μs 20.2827μs 49.3032 KOps/s 50.9581 KOps/s $\color{#d91a1a}-3.25\%$
test_tdseq_dispatch 66.5250μs 40.1218μs 24.9241 KOps/s 26.1147 KOps/s $\color{#d91a1a}-4.56\%$
test_instantiation_functorch 2.1461ms 1.3138ms 761.1711 Ops/s 771.7953 Ops/s $\color{#d91a1a}-1.38\%$
test_instantiation_td 1.4918ms 1.0134ms 986.8074 Ops/s 999.1960 Ops/s $\color{#d91a1a}-1.24\%$
test_exec_functorch 0.3046ms 0.1620ms 6.1714 KOps/s 6.1154 KOps/s $\color{#35bf28}+0.92\%$
test_exec_functional_call 0.3707ms 0.1512ms 6.6133 KOps/s 6.5503 KOps/s $\color{#35bf28}+0.96\%$
test_exec_td 0.2597ms 0.1441ms 6.9408 KOps/s 6.9015 KOps/s $\color{#35bf28}+0.57\%$
test_exec_td_decorator 0.6343ms 0.2216ms 4.5124 KOps/s 4.5581 KOps/s $\color{#d91a1a}-1.00\%$
test_vmap_mlp_speed[True-True] 0.8001ms 0.4949ms 2.0207 KOps/s 2.0778 KOps/s $\color{#d91a1a}-2.75\%$
test_vmap_mlp_speed[True-False] 0.7086ms 0.4870ms 2.0533 KOps/s 2.0784 KOps/s $\color{#d91a1a}-1.21\%$
test_vmap_mlp_speed[False-True] 0.6024ms 0.3971ms 2.5183 KOps/s 2.5109 KOps/s $\color{#35bf28}+0.30\%$
test_vmap_mlp_speed[False-False] 0.6595ms 0.3988ms 2.5076 KOps/s 2.5451 KOps/s $\color{#d91a1a}-1.47\%$
test_vmap_mlp_speed_decorator[True-True] 1.0904ms 0.5607ms 1.7833 KOps/s 1.7974 KOps/s $\color{#d91a1a}-0.78\%$
test_vmap_mlp_speed_decorator[True-False] 0.8981ms 0.5621ms 1.7791 KOps/s 1.7978 KOps/s $\color{#d91a1a}-1.04\%$
test_vmap_mlp_speed_decorator[False-True] 0.6899ms 0.4613ms 2.1676 KOps/s 2.1704 KOps/s $\color{#d91a1a}-0.13\%$
test_vmap_mlp_speed_decorator[False-False] 0.6525ms 0.4615ms 2.1666 KOps/s 2.1734 KOps/s $\color{#d91a1a}-0.31\%$
test_to_module_speed[True] 2.5886ms 1.6911ms 591.3250 Ops/s 574.6643 Ops/s $\color{#35bf28}+2.90\%$
test_to_module_speed[False] 2.6030ms 1.6605ms 602.2187 Ops/s 599.9981 Ops/s $\color{#35bf28}+0.37\%$
test_tc_init 70.9730μs 30.2214μs 33.0891 KOps/s 37.6798 KOps/s $\textbf{\color{#d91a1a}-12.18\%}$
test_tc_init_nested 0.1351ms 62.6557μs 15.9602 KOps/s 18.8805 KOps/s $\textbf{\color{#d91a1a}-15.47\%}$
test_tc_first_layer_tensor 7.3509μs 0.6929μs 1.4433 MOps/s 1.4470 MOps/s $\color{#d91a1a}-0.26\%$
test_tc_first_layer_nontensor 2.2948μs 0.6569μs 1.5223 MOps/s 1.4622 MOps/s $\color{#35bf28}+4.11\%$
test_tc_second_layer_tensor 19.1660μs 1.8881μs 529.6254 KOps/s 517.3970 KOps/s $\color{#35bf28}+2.36\%$
test_tc_second_layer_nontensor 37.0190μs 1.6797μs 595.3295 KOps/s 577.2640 KOps/s $\color{#35bf28}+3.13\%$
test_unbind 86.5181ms 5.8365ms 171.3342 Ops/s 189.8922 Ops/s $\textbf{\color{#d91a1a}-9.77\%}$
test_full_like 17.2740ms 10.8396ms 92.2541 Ops/s 93.1604 Ops/s $\color{#d91a1a}-0.97\%$
test_zeros_like 13.3613ms 6.0328ms 165.7599 Ops/s 171.1026 Ops/s $\color{#d91a1a}-3.12\%$
test_ones_like 13.4916ms 6.6360ms 150.6936 Ops/s 162.7440 Ops/s $\textbf{\color{#d91a1a}-7.40\%}$
test_clone 15.8001ms 7.6938ms 129.9750 Ops/s 125.4024 Ops/s $\color{#35bf28}+3.65\%$
test_squeeze 60.1920μs 14.1519μs 70.6618 KOps/s 71.7805 KOps/s $\color{#d91a1a}-1.56\%$
test_unsqueeze 0.1241ms 59.7969μs 16.7233 KOps/s 16.4165 KOps/s $\color{#35bf28}+1.87\%$
test_split 0.2446ms 0.1149ms 8.7001 KOps/s 8.8594 KOps/s $\color{#d91a1a}-1.80\%$
test_permute 0.2025ms 0.1262ms 7.9246 KOps/s 7.8635 KOps/s $\color{#35bf28}+0.78\%$
test_stack 28.5211ms 22.7470ms 43.9618 Ops/s 43.7842 Ops/s $\color{#35bf28}+0.41\%$
test_cat 29.9611ms 22.7640ms 43.9290 Ops/s 44.0596 Ops/s $\color{#d91a1a}-0.30\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}24$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 30.6910μs 12.9365μs 77.3005 KOps/s 71.8629 KOps/s $\textbf{\color{#35bf28}+7.57\%}$
test_plain_set_stack_nested 33.1720μs 13.1205μs 76.2167 KOps/s 70.8092 KOps/s $\textbf{\color{#35bf28}+7.64\%}$
test_plain_set_nested_inplace 40.3620μs 14.2996μs 69.9320 KOps/s 65.3317 KOps/s $\textbf{\color{#35bf28}+7.04\%}$
test_plain_set_stack_nested_inplace 49.0520μs 14.4243μs 69.3273 KOps/s 64.1251 KOps/s $\textbf{\color{#35bf28}+8.11\%}$
test_items 19.9410μs 4.7532μs 210.3826 KOps/s 207.8336 KOps/s $\color{#35bf28}+1.23\%$
test_items_nested 0.3809ms 0.3449ms 2.8995 KOps/s 2.9772 KOps/s $\color{#d91a1a}-2.61\%$
test_items_nested_locked 0.4388ms 0.3518ms 2.8424 KOps/s 2.9338 KOps/s $\color{#d91a1a}-3.11\%$
test_items_nested_leaf 0.1034ms 82.4233μs 12.1325 KOps/s 12.1979 KOps/s $\color{#d91a1a}-0.54\%$
test_items_stack_nested 0.3880ms 0.3546ms 2.8198 KOps/s 2.9780 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_items_stack_nested_leaf 0.1056ms 84.4958μs 11.8349 KOps/s 11.7770 KOps/s $\color{#35bf28}+0.49\%$
test_items_stack_nested_locked 0.4770ms 0.3464ms 2.8872 KOps/s 2.9144 KOps/s $\color{#d91a1a}-0.93\%$
test_keys 24.1110μs 4.3336μs 230.7539 KOps/s 198.2964 KOps/s $\textbf{\color{#35bf28}+16.37\%}$
test_keys_nested 92.5940μs 67.0902μs 14.9053 KOps/s 14.9638 KOps/s $\color{#d91a1a}-0.39\%$
test_keys_nested_locked 1.9947ms 72.2791μs 13.8353 KOps/s 13.8351 KOps/s $+0.00\%$
test_keys_nested_leaf 74.7430μs 57.6570μs 17.3440 KOps/s 17.3534 KOps/s $\color{#d91a1a}-0.05\%$
test_keys_stack_nested 91.6440μs 67.2350μs 14.8732 KOps/s 15.0547 KOps/s $\color{#d91a1a}-1.21\%$
test_keys_stack_nested_leaf 78.8940μs 57.7125μs 17.3273 KOps/s 17.4269 KOps/s $\color{#d91a1a}-0.57\%$
test_keys_stack_nested_locked 97.6850μs 71.7797μs 13.9315 KOps/s 14.0440 KOps/s $\color{#d91a1a}-0.80\%$
test_values 9.5673μs 1.8136μs 551.4011 KOps/s 551.6117 KOps/s $\color{#d91a1a}-0.04\%$
test_values_nested 61.3930μs 35.3610μs 28.2797 KOps/s 28.4470 KOps/s $\color{#d91a1a}-0.59\%$
test_values_nested_locked 95.8840μs 36.9128μs 27.0909 KOps/s 27.1335 KOps/s $\color{#d91a1a}-0.16\%$
test_values_nested_leaf 52.7030μs 31.4736μs 31.7727 KOps/s 32.1005 KOps/s $\color{#d91a1a}-1.02\%$
test_values_stack_nested 69.7630μs 35.9999μs 27.7779 KOps/s 27.9872 KOps/s $\color{#d91a1a}-0.75\%$
test_values_stack_nested_leaf 58.3830μs 32.3742μs 30.8888 KOps/s 31.3241 KOps/s $\color{#d91a1a}-1.39\%$
test_values_stack_nested_locked 61.1930μs 37.5960μs 26.5986 KOps/s 26.7514 KOps/s $\color{#d91a1a}-0.57\%$
test_membership 4.6816μs 0.7264μs 1.3766 MOps/s 1.4076 MOps/s $\color{#d91a1a}-2.20\%$
test_membership_nested 18.4010μs 2.5816μs 387.3538 KOps/s 385.4161 KOps/s $\color{#35bf28}+0.50\%$
test_membership_nested_leaf 31.5020μs 2.5969μs 385.0761 KOps/s 387.9399 KOps/s $\color{#d91a1a}-0.74\%$
test_membership_stacked_nested 29.7510μs 2.6170μs 382.1201 KOps/s 382.1253 KOps/s $-0.00\%$
test_membership_stacked_nested_leaf 32.3620μs 2.6198μs 381.7120 KOps/s 386.6759 KOps/s $\color{#d91a1a}-1.28\%$
test_membership_nested_last 17.7500μs 3.1124μs 321.2904 KOps/s 323.7105 KOps/s $\color{#d91a1a}-0.75\%$
test_membership_nested_leaf_last 41.2520μs 3.1061μs 321.9421 KOps/s 321.6217 KOps/s $\color{#35bf28}+0.10\%$
test_membership_stacked_nested_last 25.9810μs 3.6306μs 275.4360 KOps/s 322.3844 KOps/s $\textbf{\color{#d91a1a}-14.56\%}$
test_membership_stacked_nested_leaf_last 22.6310μs 3.5945μs 278.2024 KOps/s 323.0669 KOps/s $\textbf{\color{#d91a1a}-13.89\%}$
test_nested_getleaf 37.1620μs 8.4105μs 118.8997 KOps/s 119.6720 KOps/s $\color{#d91a1a}-0.65\%$
test_nested_get 33.9620μs 7.9164μs 126.3201 KOps/s 126.1950 KOps/s $\color{#35bf28}+0.10\%$
test_stacked_getleaf 28.1610μs 8.4558μs 118.2617 KOps/s 118.9388 KOps/s $\color{#d91a1a}-0.57\%$
test_stacked_get 35.7820μs 7.9160μs 126.3270 KOps/s 126.5128 KOps/s $\color{#d91a1a}-0.15\%$
test_nested_getitemleaf 30.4110μs 8.6083μs 116.1663 KOps/s 116.8915 KOps/s $\color{#d91a1a}-0.62\%$
test_nested_getitem 35.9920μs 8.0858μs 123.6736 KOps/s 123.9691 KOps/s $\color{#d91a1a}-0.24\%$
test_stacked_getitemleaf 37.4820μs 8.6106μs 116.1366 KOps/s 116.6541 KOps/s $\color{#d91a1a}-0.44\%$
test_stacked_getitem 28.4710μs 8.0846μs 123.6913 KOps/s 124.0367 KOps/s $\color{#d91a1a}-0.28\%$
test_lock_nested 57.9957ms 0.4053ms 2.4676 KOps/s 2.4543 KOps/s $\color{#35bf28}+0.54\%$
test_lock_stack_nested 0.3349ms 0.3063ms 3.2650 KOps/s 3.2764 KOps/s $\color{#d91a1a}-0.35\%$
test_unlock_nested 60.4020ms 0.4105ms 2.4363 KOps/s 2.4348 KOps/s $\color{#35bf28}+0.06\%$
test_unlock_stack_nested 0.3614ms 0.3133ms 3.1922 KOps/s 3.1897 KOps/s $\color{#35bf28}+0.08\%$
test_flatten_speed 0.3560ms 0.1006ms 9.9409 KOps/s 9.9215 KOps/s $\color{#35bf28}+0.20\%$
test_unflatten_speed 0.3487ms 0.2932ms 3.4107 KOps/s 3.3975 KOps/s $\color{#35bf28}+0.39\%$
test_common_ops 1.0843ms 0.5953ms 1.6798 KOps/s 1.5765 KOps/s $\textbf{\color{#35bf28}+6.55\%}$
test_creation 29.7610μs 1.6610μs 602.0377 KOps/s 607.1525 KOps/s $\color{#d91a1a}-0.84\%$
test_creation_empty 32.9310μs 9.1032μs 109.8512 KOps/s 90.9433 KOps/s $\textbf{\color{#35bf28}+20.79\%}$
test_creation_nested_1 31.1120μs 10.8372μs 92.2752 KOps/s 78.9350 KOps/s $\textbf{\color{#35bf28}+16.90\%}$
test_creation_nested_2 31.7420μs 13.1283μs 76.1715 KOps/s 66.1568 KOps/s $\textbf{\color{#35bf28}+15.14\%}$
test_clone 50.5520μs 12.1016μs 82.6337 KOps/s 82.6249 KOps/s $\color{#35bf28}+0.01\%$
test_getitem[int] 59.4330μs 10.8380μs 92.2681 KOps/s 91.8760 KOps/s $\color{#35bf28}+0.43\%$
test_getitem[slice_int] 43.8020μs 21.0333μs 47.5436 KOps/s 48.0608 KOps/s $\color{#d91a1a}-1.08\%$
test_getitem[range] 64.4530μs 47.3702μs 21.1103 KOps/s 20.9748 KOps/s $\color{#35bf28}+0.65\%$
test_getitem[tuple] 55.3630μs 18.9789μs 52.6902 KOps/s 53.0391 KOps/s $\color{#d91a1a}-0.66\%$
test_getitem[list] 0.1611ms 35.3133μs 28.3179 KOps/s 29.2690 KOps/s $\color{#d91a1a}-3.25\%$
test_setitem_dim[int] 50.7730μs 31.9915μs 31.2583 KOps/s 30.8041 KOps/s $\color{#35bf28}+1.47\%$
test_setitem_dim[slice_int] 75.5840μs 52.8881μs 18.9078 KOps/s 18.7481 KOps/s $\color{#35bf28}+0.85\%$
test_setitem_dim[range] 95.3750μs 69.8069μs 14.3252 KOps/s 14.1302 KOps/s $\color{#35bf28}+1.38\%$
test_setitem_dim[tuple] 69.7330μs 45.9530μs 21.7614 KOps/s 21.4415 KOps/s $\color{#35bf28}+1.49\%$
test_setitem 39.0920μs 16.7475μs 59.7103 KOps/s 54.4362 KOps/s $\textbf{\color{#35bf28}+9.69\%}$
test_set 54.1530μs 16.3681μs 61.0946 KOps/s 56.7663 KOps/s $\textbf{\color{#35bf28}+7.62\%}$
test_set_shared 1.5340ms 0.1006ms 9.9375 KOps/s 9.9956 KOps/s $\color{#d91a1a}-0.58\%$
test_update 87.5940μs 19.7606μs 50.6057 KOps/s 46.3439 KOps/s $\textbf{\color{#35bf28}+9.20\%}$
test_update_nested 60.0020μs 25.0251μs 39.9598 KOps/s 36.9511 KOps/s $\textbf{\color{#35bf28}+8.14\%}$
test_update__nested 65.7930μs 23.0025μs 43.4736 KOps/s 42.9539 KOps/s $\color{#35bf28}+1.21\%$
test_set_nested 50.5720μs 17.8223μs 56.1095 KOps/s 52.8528 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_set_nested_new 62.6330μs 20.6740μs 48.3699 KOps/s 45.4215 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_select 66.3630μs 33.7594μs 29.6213 KOps/s 28.6533 KOps/s $\color{#35bf28}+3.38\%$
test_select_nested 0.7466ms 54.5200μs 18.3419 KOps/s 18.6162 KOps/s $\color{#d91a1a}-1.47\%$
test_exclude_nested 0.1354ms 0.1113ms 8.9816 KOps/s 9.1178 KOps/s $\color{#d91a1a}-1.49\%$
test_empty[True] 0.4708ms 0.3559ms 2.8100 KOps/s 2.8744 KOps/s $\color{#d91a1a}-2.24\%$
test_empty[False] 3.7682μs 0.9215μs 1.0852 MOps/s 1.0833 MOps/s $\color{#35bf28}+0.18\%$
test_to 0.1030ms 77.6463μs 12.8789 KOps/s 12.6512 KOps/s $\color{#35bf28}+1.80\%$
test_to_nonblocking 92.7540μs 60.5805μs 16.5070 KOps/s 16.3422 KOps/s $\color{#35bf28}+1.01\%$
test_unbind_speed 1.5308ms 0.2662ms 3.7561 KOps/s 3.8070 KOps/s $\color{#d91a1a}-1.34\%$
test_unbind_speed_stack0 0.3234ms 0.2686ms 3.7233 KOps/s 3.7783 KOps/s $\color{#d91a1a}-1.46\%$
test_unbind_speed_stack1 74.4554ms 0.8099ms 1.2348 KOps/s 1.2211 KOps/s $\color{#35bf28}+1.12\%$
test_split 74.9035ms 1.6627ms 601.4289 Ops/s 601.8640 Ops/s $\color{#d91a1a}-0.07\%$
test_chunk 1.6432ms 1.5501ms 645.1355 Ops/s 651.4712 Ops/s $\color{#d91a1a}-0.97\%$
test_creation[device0] 0.1388ms 58.9985μs 16.9496 KOps/s 16.9750 KOps/s $\color{#d91a1a}-0.15\%$
test_creation_from_tensor 0.1504ms 54.7739μs 18.2569 KOps/s 17.9727 KOps/s $\color{#35bf28}+1.58\%$
test_add_one[memmap_tensor0] 0.1094ms 7.0572μs 141.6988 KOps/s 141.9044 KOps/s $\color{#d91a1a}-0.14\%$
test_contiguous[memmap_tensor0] 17.0910μs 0.6788μs 1.4732 MOps/s 1.4777 MOps/s $\color{#d91a1a}-0.30\%$
test_stack[memmap_tensor0] 34.7510μs 4.7857μs 208.9577 KOps/s 211.4617 KOps/s $\color{#d91a1a}-1.18\%$
test_memmaptd_index 1.0066ms 0.2871ms 3.4832 KOps/s 3.4963 KOps/s $\color{#d91a1a}-0.37\%$
test_memmaptd_index_astensor 0.7116ms 0.3597ms 2.7799 KOps/s 2.7882 KOps/s $\color{#d91a1a}-0.30\%$
test_memmaptd_index_op 76.0011ms 0.7246ms 1.3801 KOps/s 1.4310 KOps/s $\color{#d91a1a}-3.55\%$
test_serialize_model 0.1814s 0.1108s 9.0231 Ops/s 9.3965 Ops/s $\color{#d91a1a}-3.97\%$
test_serialize_model_pickle 1.3483s 1.2357s 0.8093 Ops/s 0.8062 Ops/s $\color{#35bf28}+0.38\%$
test_serialize_weights 0.1792s 0.1092s 9.1594 Ops/s 8.7800 Ops/s $\color{#35bf28}+4.32\%$
test_serialize_weights_returnearly 0.2531s 0.1052s 9.5018 Ops/s 10.5667 Ops/s $\textbf{\color{#d91a1a}-10.08\%}$
test_serialize_weights_pickle 1.3501s 1.2481s 0.8012 Ops/s 0.8010 Ops/s $\color{#35bf28}+0.03\%$
test_reshape_pytree 61.4030μs 25.9843μs 38.4848 KOps/s 38.4919 KOps/s $\color{#d91a1a}-0.02\%$
test_reshape_td 0.1656ms 31.4475μs 31.7990 KOps/s 32.0915 KOps/s $\color{#d91a1a}-0.91\%$
test_view_pytree 0.1679ms 27.1059μs 36.8923 KOps/s 39.1417 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_view_td 0.1539ms 36.0464μs 27.7421 KOps/s 27.7777 KOps/s $\color{#d91a1a}-0.13\%$
test_unbind_pytree 54.1520μs 32.4920μs 30.7768 KOps/s 30.7336 KOps/s $\color{#35bf28}+0.14\%$
test_unbind_td 0.4183ms 41.1872μs 24.2794 KOps/s 23.6763 KOps/s $\color{#35bf28}+2.55\%$
test_split_pytree 57.5230μs 35.0823μs 28.5044 KOps/s 27.7670 KOps/s $\color{#35bf28}+2.66\%$
test_split_td 0.4214ms 39.5467μs 25.2865 KOps/s 24.4324 KOps/s $\color{#35bf28}+3.50\%$
test_add_pytree 61.2130μs 38.1576μs 26.2071 KOps/s 24.5166 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_add_td 83.9440μs 53.9706μs 18.5286 KOps/s 18.4168 KOps/s $\color{#35bf28}+0.61\%$
test_distributed 0.1868ms 67.0773μs 14.9082 KOps/s 14.9584 KOps/s $\color{#d91a1a}-0.34\%$
test_tdmodule 68.6040μs 14.9999μs 66.6670 KOps/s 62.5937 KOps/s $\textbf{\color{#35bf28}+6.51\%}$
test_tdmodule_dispatch 51.3730μs 29.0357μs 34.4403 KOps/s 32.0890 KOps/s $\textbf{\color{#35bf28}+7.33\%}$
test_tdseq 34.1420μs 16.9531μs 58.9864 KOps/s 55.8212 KOps/s $\textbf{\color{#35bf28}+5.67\%}$
test_tdseq_dispatch 52.8830μs 33.0643μs 30.2441 KOps/s 28.6576 KOps/s $\textbf{\color{#35bf28}+5.54\%}$
test_instantiation_functorch 1.6319ms 1.5489ms 645.6249 Ops/s 645.9162 Ops/s $\color{#d91a1a}-0.05\%$
test_instantiation_td 1.5604ms 1.0531ms 949.6151 Ops/s 869.1219 Ops/s $\textbf{\color{#35bf28}+9.26\%}$
test_exec_functorch 0.1809ms 0.1545ms 6.4715 KOps/s 6.4293 KOps/s $\color{#35bf28}+0.66\%$
test_exec_functional_call 0.1844ms 0.1457ms 6.8614 KOps/s 6.8093 KOps/s $\color{#35bf28}+0.77\%$
test_exec_td 0.1991ms 0.1455ms 6.8719 KOps/s 6.7997 KOps/s $\color{#35bf28}+1.06\%$
test_exec_td_decorator 0.7843ms 0.2166ms 4.6159 KOps/s 4.5486 KOps/s $\color{#35bf28}+1.48\%$
test_vmap_mlp_speed[True-True] 0.7715ms 0.5802ms 1.7236 KOps/s 1.7253 KOps/s $\color{#d91a1a}-0.10\%$
test_vmap_mlp_speed[True-False] 0.6410ms 0.5786ms 1.7283 KOps/s 1.7184 KOps/s $\color{#35bf28}+0.58\%$
test_vmap_mlp_speed[False-True] 0.7500ms 0.5198ms 1.9237 KOps/s 1.9816 KOps/s $\color{#d91a1a}-2.92\%$
test_vmap_mlp_speed[False-False] 0.5452ms 0.5067ms 1.9735 KOps/s 1.9560 KOps/s $\color{#35bf28}+0.89\%$
test_vmap_mlp_speed_decorator[True-True] 1.2070ms 0.6446ms 1.5514 KOps/s 1.3669 KOps/s $\textbf{\color{#35bf28}+13.50\%}$
test_vmap_mlp_speed_decorator[True-False] 0.8038ms 0.6424ms 1.5566 KOps/s 1.5373 KOps/s $\color{#35bf28}+1.26\%$
test_vmap_mlp_speed_decorator[False-True] 0.6834ms 0.5682ms 1.7600 KOps/s 1.7560 KOps/s $\color{#35bf28}+0.23\%$
test_vmap_mlp_speed_decorator[False-False] 0.7325ms 0.5688ms 1.7581 KOps/s 1.7599 KOps/s $\color{#d91a1a}-0.10\%$
test_vmap_transformer_speed[True-True] 8.5316ms 7.9397ms 125.9486 Ops/s 132.3306 Ops/s $\color{#d91a1a}-4.82\%$
test_vmap_transformer_speed[True-False] 8.3923ms 7.7397ms 129.2038 Ops/s 129.1222 Ops/s $\color{#35bf28}+0.06\%$
test_vmap_transformer_speed[False-True] 8.0894ms 7.6873ms 130.0852 Ops/s 129.2010 Ops/s $\color{#35bf28}+0.68\%$
test_vmap_transformer_speed[False-False] 8.3229ms 7.7072ms 129.7495 Ops/s 130.6228 Ops/s $\color{#d91a1a}-0.67\%$
test_vmap_transformer_speed_decorator[True-True] 19.5197ms 18.8122ms 53.1570 Ops/s 53.1777 Ops/s $\color{#d91a1a}-0.04\%$
test_vmap_transformer_speed_decorator[True-False] 19.3573ms 18.8025ms 53.1843 Ops/s 53.4299 Ops/s $\color{#d91a1a}-0.46\%$
test_vmap_transformer_speed_decorator[False-True] 19.3351ms 18.7360ms 53.3733 Ops/s 53.6302 Ops/s $\color{#d91a1a}-0.48\%$
test_vmap_transformer_speed_decorator[False-False] 19.4893ms 18.7124ms 53.4404 Ops/s 54.4195 Ops/s $\color{#d91a1a}-1.80\%$
test_to_module_speed[True] 1.8559ms 1.5737ms 635.4325 Ops/s 643.3518 Ops/s $\color{#d91a1a}-1.23\%$
test_to_module_speed[False] 1.7400ms 1.5503ms 645.0441 Ops/s 650.6525 Ops/s $\color{#d91a1a}-0.86\%$
test_tc_init 0.1485ms 26.1133μs 38.2946 KOps/s 33.0809 KOps/s $\textbf{\color{#35bf28}+15.76\%}$
test_tc_init_nested 0.1927ms 52.3476μs 19.1031 KOps/s 17.0864 KOps/s $\textbf{\color{#35bf28}+11.80\%}$
test_tc_first_layer_tensor 3.3472μs 0.3618μs 2.7643 MOps/s 2.7800 MOps/s $\color{#d91a1a}-0.56\%$
test_tc_first_layer_nontensor 9.6989μs 0.3954μs 2.5293 MOps/s 2.5255 MOps/s $\color{#35bf28}+0.15\%$
test_tc_second_layer_tensor 0.1162ms 1.0749μs 930.3192 KOps/s 1.0262 MOps/s $\textbf{\color{#d91a1a}-9.34\%}$
test_tc_second_layer_nontensor 20.8477μs 0.8322μs 1.2017 MOps/s 1.2135 MOps/s $\color{#d91a1a}-0.97\%$
test_unbind 0.1071s 8.2167ms 121.7036 Ops/s 145.9433 Ops/s $\textbf{\color{#d91a1a}-16.61\%}$
test_full_like 13.9173ms 13.1982ms 75.7680 Ops/s 89.8856 Ops/s $\textbf{\color{#d91a1a}-15.71\%}$
test_zeros_like 8.0018ms 7.7302ms 129.3623 Ops/s 129.6393 Ops/s $\color{#d91a1a}-0.21\%$
test_ones_like 7.9439ms 7.7579ms 128.9016 Ops/s 128.5728 Ops/s $\color{#35bf28}+0.26\%$
test_clone 9.3593ms 9.2209ms 108.4499 Ops/s 108.4581 Ops/s $-0.01\%$
test_squeeze 76.4840μs 10.9995μs 90.9130 KOps/s 89.3828 KOps/s $\color{#35bf28}+1.71\%$
test_unsqueeze 93.4850μs 51.0648μs 19.5830 KOps/s 19.2732 KOps/s $\color{#35bf28}+1.61\%$
test_split 0.1358ms 96.5567μs 10.3566 KOps/s 10.2948 KOps/s $\color{#35bf28}+0.60\%$
test_permute 0.1544ms 0.1098ms 9.1112 KOps/s 8.7309 KOps/s $\color{#35bf28}+4.36\%$
test_stack 26.9905ms 26.8283ms 37.2741 Ops/s 37.3268 Ops/s $\color{#d91a1a}-0.14\%$
test_cat 27.1345ms 26.7823ms 37.3382 Ops/s 37.3493 Ops/s $\color{#d91a1a}-0.03\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants