Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] view(dtype) #835

Merged
merged 1 commit into from
Jun 25, 2024
Merged

[Feature] view(dtype) #835

merged 1 commit into from
Jun 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 25, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 25, 2024
@vmoens vmoens merged commit 19f10d0 into main Jun 25, 2024
29 of 31 checks passed
@vmoens vmoens deleted the view-dtype branch June 25, 2024 08:36
@vmoens vmoens added the enhancement New feature or request label Jun 25, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}33$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 30.8370μs 17.2000μs 58.1396 KOps/s 61.8577 KOps/s $\textbf{\color{#d91a1a}-6.01\%}$
test_plain_set_stack_nested 47.8990μs 17.6831μs 56.5512 KOps/s 60.0011 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_plain_set_nested_inplace 66.5940μs 19.5898μs 51.0469 KOps/s 53.3034 KOps/s $\color{#d91a1a}-4.23\%$
test_plain_set_stack_nested_inplace 48.4500μs 19.5891μs 51.0487 KOps/s 54.0194 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_items 18.2630μs 2.5456μs 392.8292 KOps/s 388.0221 KOps/s $\color{#35bf28}+1.24\%$
test_items_nested 1.0348ms 0.2730ms 3.6631 KOps/s 3.7532 KOps/s $\color{#d91a1a}-2.40\%$
test_items_nested_locked 0.4045ms 0.2734ms 3.6575 KOps/s 3.7472 KOps/s $\color{#d91a1a}-2.39\%$
test_items_nested_leaf 1.9883ms 76.6720μs 13.0426 KOps/s 13.0329 KOps/s $\color{#35bf28}+0.07\%$
test_items_stack_nested 0.5743ms 0.2766ms 3.6158 KOps/s 3.6546 KOps/s $\color{#d91a1a}-1.06\%$
test_items_stack_nested_leaf 0.1418ms 78.5309μs 12.7338 KOps/s 12.9993 KOps/s $\color{#d91a1a}-2.04\%$
test_items_stack_nested_locked 0.5237ms 0.2769ms 3.6111 KOps/s 3.7311 KOps/s $\color{#d91a1a}-3.22\%$
test_keys 22.6420μs 4.6015μs 217.3226 KOps/s 260.5963 KOps/s $\textbf{\color{#d91a1a}-16.61\%}$
test_keys_nested 0.2967ms 0.1386ms 7.2158 KOps/s 7.2199 KOps/s $\color{#d91a1a}-0.06\%$
test_keys_nested_locked 0.7379ms 0.1433ms 6.9765 KOps/s 6.9882 KOps/s $\color{#d91a1a}-0.17\%$
test_keys_nested_leaf 0.2251ms 0.1170ms 8.5475 KOps/s 8.6020 KOps/s $\color{#d91a1a}-0.63\%$
test_keys_stack_nested 0.2989ms 0.1370ms 7.2996 KOps/s 7.2755 KOps/s $\color{#35bf28}+0.33\%$
test_keys_stack_nested_leaf 0.2382ms 0.1166ms 8.5733 KOps/s 8.6220 KOps/s $\color{#d91a1a}-0.56\%$
test_keys_stack_nested_locked 0.2322ms 0.1404ms 7.1219 KOps/s 6.9789 KOps/s $\color{#35bf28}+2.05\%$
test_values 4.9814μs 1.1436μs 874.4446 KOps/s 857.6852 KOps/s $\color{#35bf28}+1.95\%$
test_values_nested 91.5620μs 51.0456μs 19.5903 KOps/s 19.7216 KOps/s $\color{#d91a1a}-0.67\%$
test_values_nested_locked 0.1038ms 51.8268μs 19.2950 KOps/s 19.7154 KOps/s $\color{#d91a1a}-2.13\%$
test_values_nested_leaf 0.1311ms 46.5438μs 21.4852 KOps/s 21.7424 KOps/s $\color{#d91a1a}-1.18\%$
test_values_stack_nested 0.1037ms 52.8163μs 18.9335 KOps/s 19.8197 KOps/s $\color{#d91a1a}-4.47\%$
test_values_stack_nested_leaf 96.8510μs 46.1584μs 21.6645 KOps/s 21.6887 KOps/s $\color{#d91a1a}-0.11\%$
test_values_stack_nested_locked 0.1085ms 52.7112μs 18.9713 KOps/s 19.9989 KOps/s $\textbf{\color{#d91a1a}-5.14\%}$
test_membership 13.4950μs 1.3373μs 747.7971 KOps/s 726.4169 KOps/s $\color{#35bf28}+2.94\%$
test_membership_nested 24.7060μs 3.4189μs 292.4942 KOps/s 289.4450 KOps/s $\color{#35bf28}+1.05\%$
test_membership_nested_leaf 19.2160μs 3.4116μs 293.1149 KOps/s 283.9254 KOps/s $\color{#35bf28}+3.24\%$
test_membership_stacked_nested 25.0470μs 3.3814μs 295.7352 KOps/s 286.1064 KOps/s $\color{#35bf28}+3.37\%$
test_membership_stacked_nested_leaf 24.1750μs 3.4027μs 293.8853 KOps/s 287.7104 KOps/s $\color{#35bf28}+2.15\%$
test_membership_nested_last 21.2790μs 4.2035μs 237.8993 KOps/s 240.7400 KOps/s $\color{#d91a1a}-1.18\%$
test_membership_nested_leaf_last 42.5500μs 4.2461μs 235.5109 KOps/s 238.0938 KOps/s $\color{#d91a1a}-1.08\%$
test_membership_stacked_nested_last 27.3410μs 6.7234μs 148.7352 KOps/s 240.4369 KOps/s $\textbf{\color{#d91a1a}-38.14\%}$
test_membership_stacked_nested_leaf_last 29.0740μs 6.7066μs 149.1062 KOps/s 240.4580 KOps/s $\textbf{\color{#d91a1a}-37.99\%}$
test_nested_getleaf 46.4090μs 10.6260μs 94.1087 KOps/s 91.4936 KOps/s $\color{#35bf28}+2.86\%$
test_nested_get 28.1730μs 10.0831μs 99.1759 KOps/s 96.3528 KOps/s $\color{#35bf28}+2.93\%$
test_stacked_getleaf 38.6520μs 10.5990μs 94.3485 KOps/s 92.1213 KOps/s $\color{#35bf28}+2.42\%$
test_stacked_get 37.4100μs 9.9135μs 100.8723 KOps/s 96.1027 KOps/s $\color{#35bf28}+4.96\%$
test_nested_getitemleaf 43.3310μs 11.1386μs 89.7776 KOps/s 87.0180 KOps/s $\color{#35bf28}+3.17\%$
test_nested_getitem 31.5090μs 10.2797μs 97.2796 KOps/s 93.3470 KOps/s $\color{#35bf28}+4.21\%$
test_stacked_getitemleaf 53.0190μs 11.2739μs 88.7003 KOps/s 86.8822 KOps/s $\color{#35bf28}+2.09\%$
test_stacked_getitem 42.7500μs 10.1930μs 98.1068 KOps/s 95.1222 KOps/s $\color{#35bf28}+3.14\%$
test_lock_nested 50.7510ms 0.3902ms 2.5625 KOps/s 2.9238 KOps/s $\textbf{\color{#d91a1a}-12.36\%}$
test_lock_stack_nested 0.5669ms 0.3088ms 3.2379 KOps/s 3.2392 KOps/s $\color{#d91a1a}-0.04\%$
test_unlock_nested 0.8811ms 0.3497ms 2.8595 KOps/s 2.8947 KOps/s $\color{#d91a1a}-1.22\%$
test_unlock_stack_nested 0.4937ms 0.3186ms 3.1389 KOps/s 3.1493 KOps/s $\color{#d91a1a}-0.33\%$
test_flatten_speed 0.2466ms 96.0878μs 10.4071 KOps/s 10.5464 KOps/s $\color{#d91a1a}-1.32\%$
test_unflatten_speed 0.5408ms 0.4050ms 2.4691 KOps/s 2.4158 KOps/s $\color{#35bf28}+2.21\%$
test_common_ops 4.2341ms 0.7526ms 1.3287 KOps/s 1.4271 KOps/s $\textbf{\color{#d91a1a}-6.90\%}$
test_creation 43.3810μs 1.9406μs 515.3164 KOps/s 524.8867 KOps/s $\color{#d91a1a}-1.82\%$
test_creation_empty 33.1510μs 11.4983μs 86.9692 KOps/s 106.1142 KOps/s $\textbf{\color{#d91a1a}-18.04\%}$
test_creation_nested_1 39.3040μs 14.2192μs 70.3276 KOps/s 82.2703 KOps/s $\textbf{\color{#d91a1a}-14.52\%}$
test_creation_nested_2 39.6440μs 17.4745μs 57.2262 KOps/s 64.5139 KOps/s $\textbf{\color{#d91a1a}-11.30\%}$
test_clone 0.1243ms 13.6129μs 73.4596 KOps/s 73.7443 KOps/s $\color{#d91a1a}-0.39\%$
test_getitem[int] 42.6000μs 11.7714μs 84.9520 KOps/s 86.7432 KOps/s $\color{#d91a1a}-2.06\%$
test_getitem[slice_int] 68.6180μs 22.9375μs 43.5967 KOps/s 44.3397 KOps/s $\color{#d91a1a}-1.68\%$
test_getitem[range] 78.0260μs 57.9342μs 17.2610 KOps/s 16.3666 KOps/s $\textbf{\color{#35bf28}+5.46\%}$
test_getitem[tuple] 46.3770μs 18.9866μs 52.6687 KOps/s 51.5776 KOps/s $\color{#35bf28}+2.12\%$
test_getitem[list] 0.1109ms 42.0199μs 23.7982 KOps/s 25.2627 KOps/s $\textbf{\color{#d91a1a}-5.80\%}$
test_setitem_dim[int] 51.4760μs 36.3562μs 27.5056 KOps/s 29.3070 KOps/s $\textbf{\color{#d91a1a}-6.15\%}$
test_setitem_dim[slice_int] 0.1164ms 62.3780μs 16.0313 KOps/s 16.8948 KOps/s $\textbf{\color{#d91a1a}-5.11\%}$
test_setitem_dim[range] 0.1509ms 85.2498μs 11.7302 KOps/s 12.1452 KOps/s $\color{#d91a1a}-3.42\%$
test_setitem_dim[tuple] 80.6710μs 50.8456μs 19.6674 KOps/s 20.1047 KOps/s $\color{#d91a1a}-2.18\%$
test_setitem 53.4500μs 21.2033μs 47.1624 KOps/s 50.0608 KOps/s $\textbf{\color{#d91a1a}-5.79\%}$
test_set 56.0750μs 20.8238μs 48.0220 KOps/s 51.7874 KOps/s $\textbf{\color{#d91a1a}-7.27\%}$
test_set_shared 2.3721ms 0.1454ms 6.8795 KOps/s 6.8898 KOps/s $\color{#d91a1a}-0.15\%$
test_update 0.1054ms 24.0316μs 41.6119 KOps/s 47.2493 KOps/s $\textbf{\color{#d91a1a}-11.93\%}$
test_update_nested 0.1328ms 33.4555μs 29.8904 KOps/s 33.0809 KOps/s $\textbf{\color{#d91a1a}-9.64\%}$
test_update__nested 0.1211ms 26.6533μs 37.5188 KOps/s 39.0963 KOps/s $\color{#d91a1a}-4.03\%$
test_set_nested 65.3220μs 22.5115μs 44.4218 KOps/s 47.5708 KOps/s $\textbf{\color{#d91a1a}-6.62\%}$
test_set_nested_new 92.3020μs 27.0718μs 36.9388 KOps/s 39.5639 KOps/s $\textbf{\color{#d91a1a}-6.64\%}$
test_select 92.1320μs 42.5630μs 23.4946 KOps/s 24.2988 KOps/s $\color{#d91a1a}-3.31\%$
test_select_nested 0.1206ms 60.3365μs 16.5737 KOps/s 16.2567 KOps/s $\color{#35bf28}+1.95\%$
test_exclude_nested 0.2395ms 0.1209ms 8.2738 KOps/s 8.2159 KOps/s $\color{#35bf28}+0.70\%$
test_empty[True] 0.5875ms 0.3983ms 2.5107 KOps/s 2.4920 KOps/s $\color{#35bf28}+0.75\%$
test_empty[False] 11.3013μs 1.2034μs 830.9472 KOps/s 826.0822 KOps/s $\color{#35bf28}+0.59\%$
test_unbind_speed 0.3161ms 0.2541ms 3.9359 KOps/s 3.9155 KOps/s $\color{#35bf28}+0.52\%$
test_unbind_speed_stack0 0.4973ms 0.2502ms 3.9963 KOps/s 3.9221 KOps/s $\color{#35bf28}+1.89\%$
test_unbind_speed_stack1 63.8466ms 0.7161ms 1.3965 KOps/s 1.3655 KOps/s $\color{#35bf28}+2.28\%$
test_split 63.4853ms 1.6275ms 614.4544 Ops/s 607.7157 Ops/s $\color{#35bf28}+1.11\%$
test_chunk 65.5549ms 1.6173ms 618.3019 Ops/s 621.1548 Ops/s $\color{#d91a1a}-0.46\%$
test_creation[device0] 0.1538ms 84.0054μs 11.9040 KOps/s 11.5725 KOps/s $\color{#35bf28}+2.86\%$
test_creation_from_tensor 3.7484ms 85.9963μs 11.6284 KOps/s 11.5479 KOps/s $\color{#35bf28}+0.70\%$
test_add_one[memmap_tensor0] 52.8390μs 5.5185μs 181.2075 KOps/s 176.6416 KOps/s $\color{#35bf28}+2.58\%$
test_contiguous[memmap_tensor0] 19.4760μs 0.6328μs 1.5803 MOps/s 1.5556 MOps/s $\color{#35bf28}+1.59\%$
test_stack[memmap_tensor0] 26.5200μs 3.6184μs 276.3618 KOps/s 275.0451 KOps/s $\color{#35bf28}+0.48\%$
test_memmaptd_index 0.9638ms 0.2547ms 3.9258 KOps/s 3.8830 KOps/s $\color{#35bf28}+1.10\%$
test_memmaptd_index_astensor 0.7665ms 0.3282ms 3.0471 KOps/s 3.0203 KOps/s $\color{#35bf28}+0.89\%$
test_memmaptd_index_op 1.2523ms 0.6240ms 1.6025 KOps/s 1.6752 KOps/s $\color{#d91a1a}-4.34\%$
test_serialize_model 0.1733s 0.1127s 8.8742 Ops/s 8.7208 Ops/s $\color{#35bf28}+1.76\%$
test_serialize_model_pickle 0.4476s 0.3779s 2.6462 Ops/s 2.6161 Ops/s $\color{#35bf28}+1.15\%$
test_serialize_weights 0.1766s 0.1112s 8.9944 Ops/s 8.9868 Ops/s $\color{#35bf28}+0.09\%$
test_serialize_weights_returnearly 0.1887s 0.1343s 7.4471 Ops/s 7.1480 Ops/s $\color{#35bf28}+4.18\%$
test_serialize_weights_pickle 0.7464s 0.5076s 1.9702 Ops/s 2.4644 Ops/s $\textbf{\color{#d91a1a}-20.06\%}$
test_serialize_weights_filesystem 0.1033s 93.7245ms 10.6696 Ops/s 9.9029 Ops/s $\textbf{\color{#35bf28}+7.74\%}$
test_serialize_model_filesystem 0.1652s 0.1023s 9.7745 Ops/s 10.5774 Ops/s $\textbf{\color{#d91a1a}-7.59\%}$
test_reshape_pytree 73.0670μs 25.7221μs 38.8771 KOps/s 38.9472 KOps/s $\color{#d91a1a}-0.18\%$
test_reshape_td 66.0530μs 33.9219μs 29.4795 KOps/s 29.5113 KOps/s $\color{#d91a1a}-0.11\%$
test_view_pytree 55.9240μs 25.4531μs 39.2879 KOps/s 39.2270 KOps/s $\color{#35bf28}+0.16\%$
test_view_td 0.1066ms 38.5257μs 25.9567 KOps/s 26.5377 KOps/s $\color{#d91a1a}-2.19\%$
test_unbind_pytree 71.8050μs 29.6666μs 33.7080 KOps/s 34.0630 KOps/s $\color{#d91a1a}-1.04\%$
test_unbind_td 0.4208ms 38.2349μs 26.1541 KOps/s 26.1046 KOps/s $\color{#35bf28}+0.19\%$
test_split_pytree 71.6140μs 29.6036μs 33.7796 KOps/s 34.2559 KOps/s $\color{#d91a1a}-1.39\%$
test_split_td 0.1243ms 41.6355μs 24.0179 KOps/s 24.5791 KOps/s $\color{#d91a1a}-2.28\%$
test_add_pytree 86.9020μs 35.0823μs 28.5044 KOps/s 28.7041 KOps/s $\color{#d91a1a}-0.70\%$
test_add_td 0.1302ms 59.6749μs 16.7575 KOps/s 19.2686 KOps/s $\textbf{\color{#d91a1a}-13.03\%}$
test_distributed 0.2103ms 0.1013ms 9.8743 KOps/s 9.7032 KOps/s $\color{#35bf28}+1.76\%$
test_tdmodule 80.4400μs 18.4087μs 54.3221 KOps/s 60.5728 KOps/s $\textbf{\color{#d91a1a}-10.32\%}$
test_tdmodule_dispatch 63.6390μs 36.4647μs 27.4238 KOps/s 30.6157 KOps/s $\textbf{\color{#d91a1a}-10.43\%}$
test_tdseq 39.4130μs 21.2665μs 47.0224 KOps/s 51.7901 KOps/s $\textbf{\color{#d91a1a}-9.21\%}$
test_tdseq_dispatch 98.3440μs 42.4085μs 23.5802 KOps/s 26.6557 KOps/s $\textbf{\color{#d91a1a}-11.54\%}$
test_instantiation_functorch 1.5848ms 1.3290ms 752.4697 Ops/s 764.9785 Ops/s $\color{#d91a1a}-1.64\%$
test_instantiation_td 1.6066ms 1.0227ms 977.7698 Ops/s 998.4444 Ops/s $\color{#d91a1a}-2.07\%$
test_exec_functorch 0.3128ms 0.1685ms 5.9358 KOps/s 6.4130 KOps/s $\textbf{\color{#d91a1a}-7.44\%}$
test_exec_functional_call 0.2849ms 0.1519ms 6.5828 KOps/s 6.8811 KOps/s $\color{#d91a1a}-4.33\%$
test_exec_td 0.2339ms 0.1471ms 6.7970 KOps/s 6.9554 KOps/s $\color{#d91a1a}-2.28\%$
test_exec_td_decorator 0.9100ms 0.2227ms 4.4902 KOps/s 4.5827 KOps/s $\color{#d91a1a}-2.02\%$
test_vmap_mlp_speed[True-True] 0.6790ms 0.4863ms 2.0562 KOps/s 2.0552 KOps/s $\color{#35bf28}+0.04\%$
test_vmap_mlp_speed[True-False] 2.2469ms 0.4957ms 2.0174 KOps/s 2.0870 KOps/s $\color{#d91a1a}-3.33\%$
test_vmap_mlp_speed[False-True] 0.6426ms 0.3979ms 2.5132 KOps/s 2.5180 KOps/s $\color{#d91a1a}-0.19\%$
test_vmap_mlp_speed[False-False] 0.5732ms 0.3967ms 2.5211 KOps/s 2.5611 KOps/s $\color{#d91a1a}-1.56\%$
test_vmap_mlp_speed_decorator[True-True] 1.1353ms 0.5569ms 1.7956 KOps/s 1.8260 KOps/s $\color{#d91a1a}-1.67\%$
test_vmap_mlp_speed_decorator[True-False] 0.7776ms 0.5533ms 1.8073 KOps/s 1.8319 KOps/s $\color{#d91a1a}-1.34\%$
test_vmap_mlp_speed_decorator[False-True] 0.6676ms 0.4556ms 2.1950 KOps/s 2.2104 KOps/s $\color{#d91a1a}-0.69\%$
test_vmap_mlp_speed_decorator[False-False] 0.6817ms 0.4571ms 2.1878 KOps/s 2.2126 KOps/s $\color{#d91a1a}-1.12\%$
test_to_module_speed[True] 1.8382ms 1.6778ms 596.0360 Ops/s 552.9388 Ops/s $\textbf{\color{#35bf28}+7.79\%}$
test_to_module_speed[False] 74.3202ms 1.7778ms 562.4885 Ops/s 603.3116 Ops/s $\textbf{\color{#d91a1a}-6.77\%}$
test_tc_init 63.2580μs 30.1168μs 33.2041 KOps/s 38.0118 KOps/s $\textbf{\color{#d91a1a}-12.65\%}$
test_tc_init_nested 0.1245ms 63.8659μs 15.6578 KOps/s 18.1328 KOps/s $\textbf{\color{#d91a1a}-13.65\%}$
test_tc_first_layer_tensor 4.3667μs 0.6973μs 1.4341 MOps/s 1.4576 MOps/s $\color{#d91a1a}-1.62\%$
test_tc_first_layer_nontensor 2.2477μs 0.6801μs 1.4703 MOps/s 1.1615 MOps/s $\textbf{\color{#35bf28}+26.59\%}$
test_tc_second_layer_tensor 15.1080μs 1.8565μs 538.6439 KOps/s 517.6232 KOps/s $\color{#35bf28}+4.06\%$
test_tc_second_layer_nontensor 10.1257μs 1.5395μs 649.5467 KOps/s 647.0628 KOps/s $\color{#35bf28}+0.38\%$
test_unbind 83.6673ms 6.5907ms 151.7299 Ops/s 141.9395 Ops/s $\textbf{\color{#35bf28}+6.90\%}$
test_full_like 17.2103ms 10.5670ms 94.6347 Ops/s 98.0223 Ops/s $\color{#d91a1a}-3.46\%$
test_zeros_like 11.8558ms 6.0609ms 164.9916 Ops/s 179.2206 Ops/s $\textbf{\color{#d91a1a}-7.94\%}$
test_ones_like 10.7855ms 6.1850ms 161.6814 Ops/s 162.7434 Ops/s $\color{#d91a1a}-0.65\%$
test_clone 15.0023ms 8.0934ms 123.5576 Ops/s 129.5960 Ops/s $\color{#d91a1a}-4.66\%$
test_squeeze 58.2790μs 14.0366μs 71.2424 KOps/s 72.5134 KOps/s $\color{#d91a1a}-1.75\%$
test_unsqueeze 0.2313ms 58.9466μs 16.9645 KOps/s 17.0166 KOps/s $\color{#d91a1a}-0.31\%$
test_split 0.1913ms 0.1102ms 9.0744 KOps/s 9.1371 KOps/s $\color{#d91a1a}-0.69\%$
test_permute 0.2263ms 0.1294ms 7.7291 KOps/s 7.9753 KOps/s $\color{#d91a1a}-3.09\%$
test_stack 25.2916ms 22.6581ms 44.1343 Ops/s 45.8440 Ops/s $\color{#d91a1a}-3.73\%$
test_cat 27.4339ms 22.6190ms 44.2105 Ops/s 46.3531 Ops/s $\color{#d91a1a}-4.62\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.5420μs 12.6428μs 79.0963 KOps/s 81.7570 KOps/s $\color{#d91a1a}-3.25\%$
test_plain_set_stack_nested 31.3920μs 12.7170μs 78.6349 KOps/s 80.3360 KOps/s $\color{#d91a1a}-2.12\%$
test_plain_set_nested_inplace 30.7920μs 13.9093μs 71.8941 KOps/s 73.5946 KOps/s $\color{#d91a1a}-2.31\%$
test_plain_set_stack_nested_inplace 61.0630μs 14.0947μs 70.9488 KOps/s 72.6902 KOps/s $\color{#d91a1a}-2.40\%$
test_items 20.8110μs 4.6975μs 212.8788 KOps/s 216.1500 KOps/s $\color{#d91a1a}-1.51\%$
test_items_nested 0.3849ms 0.3418ms 2.9259 KOps/s 2.9680 KOps/s $\color{#d91a1a}-1.42\%$
test_items_nested_locked 0.3864ms 0.3399ms 2.9423 KOps/s 2.9086 KOps/s $\color{#35bf28}+1.16\%$
test_items_nested_leaf 99.3660μs 82.7638μs 12.0826 KOps/s 12.2022 KOps/s $\color{#d91a1a}-0.98\%$
test_items_stack_nested 0.5403ms 0.3427ms 2.9184 KOps/s 2.8912 KOps/s $\color{#35bf28}+0.94\%$
test_items_stack_nested_leaf 0.1089ms 83.0768μs 12.0371 KOps/s 12.1134 KOps/s $\color{#d91a1a}-0.63\%$
test_items_stack_nested_locked 0.3718ms 0.3452ms 2.8969 KOps/s 2.9291 KOps/s $\color{#d91a1a}-1.10\%$
test_keys 30.5420μs 4.3888μs 227.8511 KOps/s 231.1357 KOps/s $\color{#d91a1a}-1.42\%$
test_keys_nested 1.5886ms 67.1235μs 14.8979 KOps/s 14.9338 KOps/s $\color{#d91a1a}-0.24\%$
test_keys_nested_locked 2.2390ms 72.6423μs 13.7661 KOps/s 13.9418 KOps/s $\color{#d91a1a}-1.26\%$
test_keys_nested_leaf 80.0560μs 57.6524μs 17.3453 KOps/s 17.3319 KOps/s $\color{#35bf28}+0.08\%$
test_keys_stack_nested 84.9750μs 66.0180μs 15.1474 KOps/s 14.9777 KOps/s $\color{#35bf28}+1.13\%$
test_keys_stack_nested_leaf 81.1040μs 57.0957μs 17.5144 KOps/s 17.3810 KOps/s $\color{#35bf28}+0.77\%$
test_keys_stack_nested_locked 84.6950μs 70.3915μs 14.2063 KOps/s 14.0184 KOps/s $\color{#35bf28}+1.34\%$
test_values 0.3146ms 1.8287μs 546.8428 KOps/s 554.5043 KOps/s $\color{#d91a1a}-1.38\%$
test_values_nested 51.3230μs 35.3252μs 28.3084 KOps/s 28.6794 KOps/s $\color{#d91a1a}-1.29\%$
test_values_nested_locked 63.2630μs 36.7948μs 27.1778 KOps/s 26.9138 KOps/s $\color{#35bf28}+0.98\%$
test_values_nested_leaf 47.2530μs 31.4005μs 31.8466 KOps/s 32.0348 KOps/s $\color{#d91a1a}-0.59\%$
test_values_stack_nested 55.8530μs 35.9284μs 27.8331 KOps/s 28.0507 KOps/s $\color{#d91a1a}-0.78\%$
test_values_stack_nested_leaf 59.7330μs 31.6903μs 31.5554 KOps/s 31.2968 KOps/s $\color{#35bf28}+0.83\%$
test_values_stack_nested_locked 64.8040μs 37.4490μs 26.7030 KOps/s 26.6258 KOps/s $\color{#35bf28}+0.29\%$
test_membership 3.6101μs 0.7295μs 1.3708 MOps/s 1.2027 MOps/s $\textbf{\color{#35bf28}+13.98\%}$
test_membership_nested 18.5110μs 2.5397μs 393.7540 KOps/s 392.2686 KOps/s $\color{#35bf28}+0.38\%$
test_membership_nested_leaf 19.3510μs 2.5823μs 387.2513 KOps/s 396.4587 KOps/s $\color{#d91a1a}-2.32\%$
test_membership_stacked_nested 20.3510μs 2.5766μs 388.1030 KOps/s 388.1644 KOps/s $\color{#d91a1a}-0.02\%$
test_membership_stacked_nested_leaf 16.0510μs 2.5706μs 389.0137 KOps/s 394.3589 KOps/s $\color{#d91a1a}-1.36\%$
test_membership_nested_last 23.5210μs 3.1187μs 320.6469 KOps/s 326.0578 KOps/s $\color{#d91a1a}-1.66\%$
test_membership_nested_leaf_last 22.3920μs 3.1259μs 319.9093 KOps/s 325.8186 KOps/s $\color{#d91a1a}-1.81\%$
test_membership_stacked_nested_last 29.2720μs 9.7739μs 102.3137 KOps/s 327.2373 KOps/s $\textbf{\color{#d91a1a}-68.73\%}$
test_membership_stacked_nested_leaf_last 44.9020μs 9.8139μs 101.8958 KOps/s 330.0980 KOps/s $\textbf{\color{#d91a1a}-69.13\%}$
test_nested_getleaf 24.0910μs 8.3604μs 119.6120 KOps/s 119.7369 KOps/s $\color{#d91a1a}-0.10\%$
test_nested_get 21.3310μs 7.9031μs 126.5319 KOps/s 125.9092 KOps/s $\color{#35bf28}+0.49\%$
test_stacked_getleaf 24.9520μs 8.4925μs 117.7512 KOps/s 118.7637 KOps/s $\color{#d91a1a}-0.85\%$
test_stacked_get 26.3510μs 7.9346μs 126.0298 KOps/s 126.8245 KOps/s $\color{#d91a1a}-0.63\%$
test_nested_getitemleaf 24.0210μs 8.5495μs 116.9654 KOps/s 116.9312 KOps/s $\color{#35bf28}+0.03\%$
test_nested_getitem 25.1420μs 8.0941μs 123.5461 KOps/s 123.8014 KOps/s $\color{#d91a1a}-0.21\%$
test_stacked_getitemleaf 25.6310μs 8.6332μs 115.8315 KOps/s 116.2514 KOps/s $\color{#d91a1a}-0.36\%$
test_stacked_getitem 24.2610μs 8.0921μs 123.5780 KOps/s 123.9417 KOps/s $\color{#d91a1a}-0.29\%$
test_lock_nested 59.0519ms 0.4101ms 2.4382 KOps/s 2.4275 KOps/s $\color{#35bf28}+0.44\%$
test_lock_stack_nested 0.3768ms 0.2967ms 3.3709 KOps/s 3.2562 KOps/s $\color{#35bf28}+3.52\%$
test_unlock_nested 60.6983ms 0.4132ms 2.4202 KOps/s 2.4074 KOps/s $\color{#35bf28}+0.53\%$
test_unlock_stack_nested 0.3361ms 0.3040ms 3.2896 KOps/s 3.1748 KOps/s $\color{#35bf28}+3.62\%$
test_flatten_speed 0.3246ms 0.1020ms 9.8066 KOps/s 10.0690 KOps/s $\color{#d91a1a}-2.61\%$
test_unflatten_speed 0.3442ms 0.2923ms 3.4207 KOps/s 3.4365 KOps/s $\color{#d91a1a}-0.46\%$
test_common_ops 1.0733ms 0.5763ms 1.7353 KOps/s 1.7398 KOps/s $\color{#d91a1a}-0.26\%$
test_creation 32.4620μs 1.6540μs 604.5967 KOps/s 610.2958 KOps/s $\color{#d91a1a}-0.93\%$
test_creation_empty 23.7710μs 8.2472μs 121.2528 KOps/s 133.3667 KOps/s $\textbf{\color{#d91a1a}-9.08\%}$
test_creation_nested_1 29.3820μs 10.1089μs 98.9230 KOps/s 107.2884 KOps/s $\textbf{\color{#d91a1a}-7.80\%}$
test_creation_nested_2 27.6310μs 12.3911μs 80.7028 KOps/s 87.1865 KOps/s $\textbf{\color{#d91a1a}-7.44\%}$
test_clone 97.6850μs 11.9254μs 83.8545 KOps/s 83.7207 KOps/s $\color{#35bf28}+0.16\%$
test_getitem[int] 26.1020μs 11.3221μs 88.3228 KOps/s 88.7201 KOps/s $\color{#d91a1a}-0.45\%$
test_getitem[slice_int] 43.3430μs 21.8306μs 45.8073 KOps/s 46.7956 KOps/s $\color{#d91a1a}-2.11\%$
test_getitem[range] 64.9230μs 47.4946μs 21.0550 KOps/s 20.5252 KOps/s $\color{#35bf28}+2.58\%$
test_getitem[tuple] 38.5020μs 18.9911μs 52.6563 KOps/s 52.1069 KOps/s $\color{#35bf28}+1.05\%$
test_getitem[list] 0.1106ms 34.7876μs 28.7458 KOps/s 28.6279 KOps/s $\color{#35bf28}+0.41\%$
test_setitem_dim[int] 46.4730μs 28.7516μs 34.7807 KOps/s 35.4559 KOps/s $\color{#d91a1a}-1.90\%$
test_setitem_dim[slice_int] 68.9330μs 50.1567μs 19.9375 KOps/s 20.3919 KOps/s $\color{#d91a1a}-2.23\%$
test_setitem_dim[range] 85.8440μs 65.6749μs 15.2265 KOps/s 15.1633 KOps/s $\color{#35bf28}+0.42\%$
test_setitem_dim[tuple] 71.1640μs 42.9874μs 23.2626 KOps/s 23.1358 KOps/s $\color{#35bf28}+0.55\%$
test_setitem 40.3920μs 16.5900μs 60.2774 KOps/s 60.3574 KOps/s $\color{#d91a1a}-0.13\%$
test_set 55.4430μs 16.0904μs 62.1487 KOps/s 62.1921 KOps/s $\color{#d91a1a}-0.07\%$
test_set_shared 1.6160ms 99.5732μs 10.0429 KOps/s 9.9321 KOps/s $\color{#35bf28}+1.12\%$
test_update 83.1050μs 18.5642μs 53.8671 KOps/s 54.9270 KOps/s $\color{#d91a1a}-1.93\%$
test_update_nested 57.5730μs 23.5564μs 42.4512 KOps/s 42.3535 KOps/s $\color{#35bf28}+0.23\%$
test_update__nested 60.4530μs 22.7593μs 43.9382 KOps/s 44.5109 KOps/s $\color{#d91a1a}-1.29\%$
test_set_nested 52.8830μs 16.9473μs 59.0066 KOps/s 57.9632 KOps/s $\color{#35bf28}+1.80\%$
test_set_nested_new 64.6830μs 20.2631μs 49.3507 KOps/s 50.2473 KOps/s $\color{#d91a1a}-1.78\%$
test_select 0.1671ms 33.9908μs 29.4197 KOps/s 30.3289 KOps/s $\color{#d91a1a}-3.00\%$
test_select_nested 0.6522ms 54.0539μs 18.5001 KOps/s 18.3933 KOps/s $\color{#35bf28}+0.58\%$
test_exclude_nested 0.1302ms 0.1097ms 9.1120 KOps/s 9.2759 KOps/s $\color{#d91a1a}-1.77\%$
test_empty[True] 0.4155ms 0.3443ms 2.9045 KOps/s 2.9055 KOps/s $\color{#d91a1a}-0.04\%$
test_empty[False] 2.4672μs 0.9175μs 1.0899 MOps/s 1.0703 MOps/s $\color{#35bf28}+1.82\%$
test_to 0.1066ms 78.7941μs 12.6913 KOps/s 13.2375 KOps/s $\color{#d91a1a}-4.13\%$
test_to_nonblocking 97.8270μs 62.8216μs 15.9181 KOps/s 16.7795 KOps/s $\textbf{\color{#d91a1a}-5.13\%}$
test_unbind_speed 0.3001ms 0.2679ms 3.7327 KOps/s 3.7327 KOps/s $-0.00\%$
test_unbind_speed_stack0 0.3829ms 0.2620ms 3.8172 KOps/s 3.7206 KOps/s $\color{#35bf28}+2.60\%$
test_unbind_speed_stack1 76.0118ms 0.8152ms 1.2267 KOps/s 1.2162 KOps/s $\color{#35bf28}+0.86\%$
test_split 76.8456ms 1.6581ms 603.0871 Ops/s 592.7987 Ops/s $\color{#35bf28}+1.74\%$
test_chunk 1.6028ms 1.5348ms 651.5317 Ops/s 641.2141 Ops/s $\color{#35bf28}+1.61\%$
test_creation[device0] 0.1283ms 58.4179μs 17.1181 KOps/s 17.1561 KOps/s $\color{#d91a1a}-0.22\%$
test_creation_from_tensor 0.1371ms 53.9366μs 18.5403 KOps/s 18.3469 KOps/s $\color{#35bf28}+1.05\%$
test_add_one[memmap_tensor0] 94.3150μs 7.1118μs 140.6122 KOps/s 140.9876 KOps/s $\color{#d91a1a}-0.27\%$
test_contiguous[memmap_tensor0] 25.8820μs 0.7029μs 1.4227 MOps/s 1.4831 MOps/s $\color{#d91a1a}-4.07\%$
test_stack[memmap_tensor0] 26.0710μs 5.0760μs 197.0043 KOps/s 196.0243 KOps/s $\color{#35bf28}+0.50\%$
test_memmaptd_index 1.1349ms 0.2921ms 3.4236 KOps/s 3.3837 KOps/s $\color{#35bf28}+1.18\%$
test_memmaptd_index_astensor 0.8230ms 0.3615ms 2.7665 KOps/s 2.7309 KOps/s $\color{#35bf28}+1.30\%$
test_memmaptd_index_op 1.1033ms 0.6574ms 1.5212 KOps/s 1.4045 KOps/s $\textbf{\color{#35bf28}+8.31\%}$
test_serialize_model 0.1068s 0.1023s 9.7738 Ops/s 9.1931 Ops/s $\textbf{\color{#35bf28}+6.32\%}$
test_serialize_model_pickle 1.3485s 1.2360s 0.8091 Ops/s 0.8087 Ops/s $\color{#35bf28}+0.04\%$
test_serialize_weights 0.1799s 0.1087s 9.2003 Ops/s 8.7214 Ops/s $\textbf{\color{#35bf28}+5.49\%}$
test_serialize_weights_returnearly 0.1952s 94.9205ms 10.5351 Ops/s 9.6905 Ops/s $\textbf{\color{#35bf28}+8.72\%}$
test_serialize_weights_pickle 1.3510s 1.2363s 0.8089 Ops/s 0.8087 Ops/s $\color{#35bf28}+0.02\%$
test_reshape_pytree 0.1385ms 25.8207μs 38.7285 KOps/s 38.7132 KOps/s $\color{#35bf28}+0.04\%$
test_reshape_td 55.5230μs 30.9546μs 32.3054 KOps/s 32.8362 KOps/s $\color{#d91a1a}-1.62\%$
test_view_pytree 0.2468ms 25.4033μs 39.3650 KOps/s 39.2226 KOps/s $\color{#35bf28}+0.36\%$
test_view_td 0.1550ms 36.4689μs 27.4206 KOps/s 28.3657 KOps/s $\color{#d91a1a}-3.33\%$
test_unbind_pytree 0.2403ms 32.0858μs 31.1664 KOps/s 31.2124 KOps/s $\color{#d91a1a}-0.15\%$
test_unbind_td 0.4511ms 42.2973μs 23.6422 KOps/s 24.4857 KOps/s $\color{#d91a1a}-3.45\%$
test_split_pytree 0.2542ms 35.2922μs 28.3349 KOps/s 29.0758 KOps/s $\color{#d91a1a}-2.55\%$
test_split_td 0.1737ms 39.3897μs 25.3873 KOps/s 25.1060 KOps/s $\color{#35bf28}+1.12\%$
test_add_pytree 0.2707ms 37.2423μs 26.8512 KOps/s 26.7642 KOps/s $\color{#35bf28}+0.33\%$
test_add_td 0.1642ms 48.2897μs 20.7084 KOps/s 20.1882 KOps/s $\color{#35bf28}+2.58\%$
test_distributed 3.3108ms 78.0535μs 12.8117 KOps/s 15.2046 KOps/s $\textbf{\color{#d91a1a}-15.74\%}$
test_tdmodule 41.0430μs 14.5384μs 68.7832 KOps/s 72.4923 KOps/s $\textbf{\color{#d91a1a}-5.12\%}$
test_tdmodule_dispatch 44.2230μs 27.9109μs 35.8283 KOps/s 36.4559 KOps/s $\color{#d91a1a}-1.72\%$
test_tdseq 30.8920μs 15.9802μs 62.5776 KOps/s 63.0150 KOps/s $\color{#d91a1a}-0.69\%$
test_tdseq_dispatch 50.9030μs 31.0881μs 32.1667 KOps/s 32.2963 KOps/s $\color{#d91a1a}-0.40\%$
test_instantiation_functorch 81.6346ms 1.6633ms 601.2209 Ops/s 656.2913 Ops/s $\textbf{\color{#d91a1a}-8.39\%}$
test_instantiation_td 1.5623ms 1.0391ms 962.3893 Ops/s 951.1799 Ops/s $\color{#35bf28}+1.18\%$
test_exec_functorch 0.1897ms 0.1503ms 6.6512 KOps/s 6.5243 KOps/s $\color{#35bf28}+1.94\%$
test_exec_functional_call 0.2991ms 0.1411ms 7.0893 KOps/s 6.9677 KOps/s $\color{#35bf28}+1.75\%$
test_exec_td 0.1747ms 0.1399ms 7.1475 KOps/s 6.9215 KOps/s $\color{#35bf28}+3.26\%$
test_exec_td_decorator 0.5444ms 0.2103ms 4.7548 KOps/s 4.6019 KOps/s $\color{#35bf28}+3.32\%$
test_vmap_mlp_speed[True-True] 0.8087ms 0.5785ms 1.7287 KOps/s 1.7099 KOps/s $\color{#35bf28}+1.10\%$
test_vmap_mlp_speed[True-False] 0.7233ms 0.5757ms 1.7370 KOps/s 1.7161 KOps/s $\color{#35bf28}+1.22\%$
test_vmap_mlp_speed[False-True] 0.6501ms 0.5076ms 1.9702 KOps/s 1.9395 KOps/s $\color{#35bf28}+1.58\%$
test_vmap_mlp_speed[False-False] 0.6526ms 0.5075ms 1.9703 KOps/s 1.8870 KOps/s $\color{#35bf28}+4.41\%$
test_vmap_mlp_speed_decorator[True-True] 0.7814ms 0.6406ms 1.5611 KOps/s 1.5404 KOps/s $\color{#35bf28}+1.34\%$
test_vmap_mlp_speed_decorator[True-False] 0.8312ms 0.6416ms 1.5587 KOps/s 1.5469 KOps/s $\color{#35bf28}+0.76\%$
test_vmap_mlp_speed_decorator[False-True] 0.6717ms 0.5671ms 1.7634 KOps/s 1.7378 KOps/s $\color{#35bf28}+1.47\%$
test_vmap_mlp_speed_decorator[False-False] 0.7375ms 0.5683ms 1.7596 KOps/s 1.7387 KOps/s $\color{#35bf28}+1.20\%$
test_vmap_transformer_speed[True-True] 7.8055ms 7.6207ms 131.2221 Ops/s 129.1978 Ops/s $\color{#35bf28}+1.57\%$
test_vmap_transformer_speed[True-False] 8.0031ms 7.7117ms 129.6730 Ops/s 129.2732 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_transformer_speed[False-True] 7.8988ms 7.6240ms 131.1643 Ops/s 130.2706 Ops/s $\color{#35bf28}+0.69\%$
test_vmap_transformer_speed[False-False] 7.8599ms 7.5999ms 131.5804 Ops/s 130.0286 Ops/s $\color{#35bf28}+1.19\%$
test_vmap_transformer_speed_decorator[True-True] 19.1960ms 18.6161ms 53.7169 Ops/s 53.3749 Ops/s $\color{#35bf28}+0.64\%$
test_vmap_transformer_speed_decorator[True-False] 19.0744ms 18.4963ms 54.0650 Ops/s 53.3293 Ops/s $\color{#35bf28}+1.38\%$
test_vmap_transformer_speed_decorator[False-True] 18.5178ms 18.3695ms 54.4380 Ops/s 53.6767 Ops/s $\color{#35bf28}+1.42\%$
test_vmap_transformer_speed_decorator[False-False] 19.0285ms 18.4441ms 54.2178 Ops/s 53.8156 Ops/s $\color{#35bf28}+0.75\%$
test_to_module_speed[True] 3.1389ms 1.5144ms 660.3252 Ops/s 660.3647 Ops/s $-0.01\%$
test_to_module_speed[False] 2.0084ms 1.4926ms 669.9780 Ops/s 673.7126 Ops/s $\color{#d91a1a}-0.55\%$
test_tc_init 44.3030μs 23.3807μs 42.7703 KOps/s 45.6545 KOps/s $\textbf{\color{#d91a1a}-6.32\%}$
test_tc_init_nested 83.5750μs 51.0374μs 19.5935 KOps/s 23.6734 KOps/s $\textbf{\color{#d91a1a}-17.23\%}$
test_tc_first_layer_tensor 0.6768μs 0.3616μs 2.7654 MOps/s 2.7887 MOps/s $\color{#d91a1a}-0.84\%$
test_tc_first_layer_nontensor 1.8225μs 0.3961μs 2.5248 MOps/s 2.5777 MOps/s $\color{#d91a1a}-2.05\%$
test_tc_second_layer_tensor 15.7500μs 1.0473μs 954.8297 KOps/s 941.8251 KOps/s $\color{#35bf28}+1.38\%$
test_tc_second_layer_nontensor 3.3868μs 0.8196μs 1.2202 MOps/s 1.2468 MOps/s $\color{#d91a1a}-2.13\%$
test_unbind 0.1138s 6.9043ms 144.8363 Ops/s 146.5602 Ops/s $\color{#d91a1a}-1.18\%$
test_full_like 14.3617ms 13.5585ms 73.7543 Ops/s 75.0312 Ops/s $\color{#d91a1a}-1.70\%$
test_zeros_like 8.4042ms 7.9495ms 125.7940 Ops/s 127.4109 Ops/s $\color{#d91a1a}-1.27\%$
test_ones_like 8.2422ms 7.9485ms 125.8096 Ops/s 126.8065 Ops/s $\color{#d91a1a}-0.79\%$
test_clone 10.2541ms 9.6362ms 103.7755 Ops/s 104.4430 Ops/s $\color{#d91a1a}-0.64\%$
test_squeeze 60.6330μs 10.7033μs 93.4295 KOps/s 91.3727 KOps/s $\color{#35bf28}+2.25\%$
test_unsqueeze 0.1100ms 50.2109μs 19.9160 KOps/s 19.3748 KOps/s $\color{#35bf28}+2.79\%$
test_split 0.1458ms 95.0603μs 10.5196 KOps/s 9.9154 KOps/s $\textbf{\color{#35bf28}+6.09\%}$
test_permute 0.1678ms 0.1084ms 9.2230 KOps/s 9.0910 KOps/s $\color{#35bf28}+1.45\%$
test_stack 28.7045ms 27.8049ms 35.9649 Ops/s 36.3247 Ops/s $\color{#d91a1a}-0.99\%$
test_cat 28.2263ms 27.6497ms 36.1668 Ops/s 36.1740 Ops/s $\color{#d91a1a}-0.02\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants