Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Compile - tensorclass compatibility #882

Merged
merged 6 commits into from
Jul 15, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 12, 2024

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 12, 2024
Copy link

github-actions bot commented Jul 12, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 41.1170μs 18.5890μs 53.7952 KOps/s 55.4261 KOps/s $\color{#d91a1a}-2.94\%$
test_plain_set_stack_nested 47.0280μs 18.9038μs 52.8994 KOps/s 50.4211 KOps/s $\color{#35bf28}+4.92\%$
test_plain_set_nested_inplace 67.2750μs 20.6969μs 48.3165 KOps/s 49.1865 KOps/s $\color{#d91a1a}-1.77\%$
test_plain_set_stack_nested_inplace 72.6140μs 20.5495μs 48.6629 KOps/s 49.6198 KOps/s $\color{#d91a1a}-1.93\%$
test_items 16.9610μs 2.6396μs 378.8460 KOps/s 392.0118 KOps/s $\color{#d91a1a}-3.36\%$
test_items_nested 2.2345ms 0.3730ms 2.6809 KOps/s 2.6540 KOps/s $\color{#35bf28}+1.02\%$
test_items_nested_locked 0.5305ms 0.3706ms 2.6981 KOps/s 2.7264 KOps/s $\color{#d91a1a}-1.04\%$
test_items_nested_leaf 0.1603ms 85.7794μs 11.6578 KOps/s 11.5462 KOps/s $\color{#35bf28}+0.97\%$
test_items_stack_nested 0.5487ms 0.3754ms 2.6640 KOps/s 2.7268 KOps/s $\color{#d91a1a}-2.30\%$
test_items_stack_nested_leaf 0.1605ms 87.0092μs 11.4930 KOps/s 11.6208 KOps/s $\color{#d91a1a}-1.10\%$
test_items_stack_nested_locked 0.8169ms 0.3790ms 2.6385 KOps/s 2.7395 KOps/s $\color{#d91a1a}-3.68\%$
test_keys 0.1105ms 4.2346μs 236.1494 KOps/s 217.2154 KOps/s $\textbf{\color{#35bf28}+8.72\%}$
test_keys_nested 0.2506ms 0.1457ms 6.8618 KOps/s 6.9305 KOps/s $\color{#d91a1a}-0.99\%$
test_keys_nested_locked 0.6740ms 0.1511ms 6.6198 KOps/s 6.5846 KOps/s $\color{#35bf28}+0.53\%$
test_keys_nested_leaf 0.2213ms 0.1233ms 8.1131 KOps/s 8.1110 KOps/s $\color{#35bf28}+0.03\%$
test_keys_stack_nested 0.2339ms 0.1448ms 6.9062 KOps/s 6.8977 KOps/s $\color{#35bf28}+0.12\%$
test_keys_stack_nested_leaf 0.2229ms 0.1232ms 8.1146 KOps/s 8.1304 KOps/s $\color{#d91a1a}-0.19\%$
test_keys_stack_nested_locked 0.4618ms 0.1503ms 6.6517 KOps/s 6.6377 KOps/s $\color{#35bf28}+0.21\%$
test_values 10.6675μs 1.1817μs 846.2389 KOps/s 888.8857 KOps/s $\color{#d91a1a}-4.80\%$
test_values_nested 0.1070ms 49.7063μs 20.1182 KOps/s 20.2426 KOps/s $\color{#d91a1a}-0.61\%$
test_values_nested_locked 0.1154ms 49.2208μs 20.3166 KOps/s 19.8622 KOps/s $\color{#35bf28}+2.29\%$
test_values_nested_leaf 0.1155ms 44.7747μs 22.3341 KOps/s 22.4624 KOps/s $\color{#d91a1a}-0.57\%$
test_values_stack_nested 0.1022ms 51.2425μs 19.5151 KOps/s 20.2883 KOps/s $\color{#d91a1a}-3.81\%$
test_values_stack_nested_leaf 85.9000μs 44.3935μs 22.5258 KOps/s 22.4965 KOps/s $\color{#35bf28}+0.13\%$
test_values_stack_nested_locked 98.8240μs 51.4953μs 19.4193 KOps/s 20.3545 KOps/s $\color{#d91a1a}-4.59\%$
test_membership 26.9700μs 0.8916μs 1.1215 MOps/s 1.3998 MOps/s $\textbf{\color{#d91a1a}-19.88\%}$
test_membership_nested 30.8380μs 2.6648μs 375.2613 KOps/s 363.4403 KOps/s $\color{#35bf28}+3.25\%$
test_membership_nested_leaf 21.0890μs 2.7166μs 368.1136 KOps/s 366.9289 KOps/s $\color{#35bf28}+0.32\%$
test_membership_stacked_nested 23.7640μs 2.6530μs 376.9310 KOps/s 363.5982 KOps/s $\color{#35bf28}+3.67\%$
test_membership_stacked_nested_leaf 18.0840μs 2.6939μs 371.2123 KOps/s 368.7125 KOps/s $\color{#35bf28}+0.68\%$
test_membership_nested_last 36.2770μs 3.9581μs 252.6476 KOps/s 249.5506 KOps/s $\color{#35bf28}+1.24\%$
test_membership_nested_leaf_last 19.7160μs 3.9881μs 250.7467 KOps/s 249.7668 KOps/s $\color{#35bf28}+0.39\%$
test_membership_stacked_nested_last 31.2680μs 4.5466μs 219.9431 KOps/s 251.7478 KOps/s $\textbf{\color{#d91a1a}-12.63\%}$
test_membership_stacked_nested_leaf_last 52.2970μs 4.5843μs 218.1373 KOps/s 249.1957 KOps/s $\textbf{\color{#d91a1a}-12.46\%}$
test_nested_getleaf 42.0590μs 10.9642μs 91.2061 KOps/s 92.3531 KOps/s $\color{#d91a1a}-1.24\%$
test_nested_get 50.1330μs 10.4249μs 95.9239 KOps/s 98.5080 KOps/s $\color{#d91a1a}-2.62\%$
test_stacked_getleaf 36.5580μs 10.8997μs 91.7457 KOps/s 93.9805 KOps/s $\color{#d91a1a}-2.38\%$
test_stacked_get 35.0760μs 10.2582μs 97.4831 KOps/s 98.9593 KOps/s $\color{#d91a1a}-1.49\%$
test_nested_getitemleaf 30.4870μs 11.3080μs 88.4330 KOps/s 88.7168 KOps/s $\color{#d91a1a}-0.32\%$
test_nested_getitem 0.1193ms 10.5966μs 94.3696 KOps/s 96.8241 KOps/s $\color{#d91a1a}-2.54\%$
test_stacked_getitemleaf 32.7000μs 11.3130μs 88.3941 KOps/s 88.8381 KOps/s $\color{#d91a1a}-0.50\%$
test_stacked_getitem 40.2950μs 10.3639μs 96.4889 KOps/s 96.7847 KOps/s $\color{#d91a1a}-0.31\%$
test_lock_nested 0.8743ms 0.4633ms 2.1584 KOps/s 2.1353 KOps/s $\color{#35bf28}+1.08\%$
test_lock_stack_nested 0.9225ms 0.4314ms 2.3183 KOps/s 2.3052 KOps/s $\color{#35bf28}+0.57\%$
test_unlock_nested 0.8646ms 0.3837ms 2.6061 KOps/s 2.2068 KOps/s $\textbf{\color{#35bf28}+18.09\%}$
test_unlock_stack_nested 0.6887ms 0.3438ms 2.9087 KOps/s 2.8587 KOps/s $\color{#35bf28}+1.75\%$
test_flatten_speed 0.6324ms 0.1051ms 9.5174 KOps/s 9.5568 KOps/s $\color{#d91a1a}-0.41\%$
test_unflatten_speed 1.0108ms 0.4433ms 2.2556 KOps/s 2.2891 KOps/s $\color{#d91a1a}-1.46\%$
test_common_ops 4.8669ms 0.8314ms 1.2028 KOps/s 1.2428 KOps/s $\color{#d91a1a}-3.22\%$
test_creation 0.1032ms 2.4176μs 413.6366 KOps/s 432.9047 KOps/s $\color{#d91a1a}-4.45\%$
test_creation_empty 45.8260μs 13.2240μs 75.6202 KOps/s 78.9336 KOps/s $\color{#d91a1a}-4.20\%$
test_creation_nested_1 58.1780μs 16.4242μs 60.8857 KOps/s 62.5851 KOps/s $\color{#d91a1a}-2.72\%$
test_creation_nested_2 67.8560μs 20.2307μs 49.4298 KOps/s 51.1970 KOps/s $\color{#d91a1a}-3.45\%$
test_clone 0.1071ms 13.0327μs 76.7302 KOps/s 74.7032 KOps/s $\color{#35bf28}+2.71\%$
test_getitem[int] 42.6290μs 11.7089μs 85.4050 KOps/s 85.6745 KOps/s $\color{#d91a1a}-0.31\%$
test_getitem[slice_int] 59.6210μs 24.1382μs 41.4280 KOps/s 42.4525 KOps/s $\color{#d91a1a}-2.41\%$
test_getitem[range] 0.2802ms 47.2438μs 21.1668 KOps/s 22.3256 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_getitem[tuple] 50.0130μs 19.8379μs 50.4086 KOps/s 51.9597 KOps/s $\color{#d91a1a}-2.99\%$
test_getitem[list] 0.3610ms 41.1971μs 24.2735 KOps/s 24.8605 KOps/s $\color{#d91a1a}-2.36\%$
test_setitem_dim[int] 86.5710μs 36.6883μs 27.2567 KOps/s 28.6899 KOps/s $\color{#d91a1a}-5.00\%$
test_setitem_dim[slice_int] 0.1262ms 64.0246μs 15.6190 KOps/s 15.8952 KOps/s $\color{#d91a1a}-1.74\%$
test_setitem_dim[range] 0.1300ms 84.1439μs 11.8844 KOps/s 12.0544 KOps/s $\color{#d91a1a}-1.41\%$
test_setitem_dim[tuple] 87.9340μs 52.1819μs 19.1637 KOps/s 19.3380 KOps/s $\color{#d91a1a}-0.90\%$
test_setitem 0.1175ms 21.0265μs 47.5591 KOps/s 46.9194 KOps/s $\color{#35bf28}+1.36\%$
test_set 0.1265ms 20.6264μs 48.4815 KOps/s 48.4159 KOps/s $\color{#35bf28}+0.14\%$
test_set_shared 1.6311ms 0.1673ms 5.9786 KOps/s 5.9238 KOps/s $\color{#35bf28}+0.93\%$
test_update 0.1879ms 24.4710μs 40.8648 KOps/s 41.2466 KOps/s $\color{#d91a1a}-0.93\%$
test_update_nested 0.1455ms 33.3237μs 30.0087 KOps/s 29.8121 KOps/s $\color{#35bf28}+0.66\%$
test_update__nested 0.1106ms 25.1612μs 39.7437 KOps/s 39.5094 KOps/s $\color{#35bf28}+0.59\%$
test_set_nested 0.1531ms 22.7857μs 43.8873 KOps/s 43.8483 KOps/s $\color{#35bf28}+0.09\%$
test_set_nested_new 0.1343ms 27.5675μs 36.2747 KOps/s 36.6207 KOps/s $\color{#d91a1a}-0.95\%$
test_select 0.1674ms 43.4993μs 22.9889 KOps/s 23.4945 KOps/s $\color{#d91a1a}-2.15\%$
test_select_nested 0.1213ms 60.8845μs 16.4246 KOps/s 16.4609 KOps/s $\color{#d91a1a}-0.22\%$
test_exclude_nested 0.1812ms 80.6000μs 12.4070 KOps/s 12.4216 KOps/s $\color{#d91a1a}-0.12\%$
test_empty[True] 0.4670ms 0.3469ms 2.8827 KOps/s 2.9137 KOps/s $\color{#d91a1a}-1.06\%$
test_empty[False] 7.4538μs 1.2636μs 791.4072 KOps/s 797.4495 KOps/s $\color{#d91a1a}-0.76\%$
test_unbind_speed 0.5084ms 0.2823ms 3.5425 KOps/s 3.5615 KOps/s $\color{#d91a1a}-0.53\%$
test_unbind_speed_stack0 0.4246ms 0.2717ms 3.6807 KOps/s 3.5543 KOps/s $\color{#35bf28}+3.56\%$
test_unbind_speed_stack1 79.2189ms 0.7640ms 1.3090 KOps/s 1.2839 KOps/s $\color{#35bf28}+1.95\%$
test_split 76.2580ms 1.6504ms 605.9219 Ops/s 670.2860 Ops/s $\textbf{\color{#d91a1a}-9.60\%}$
test_chunk 77.6195ms 1.6599ms 602.4580 Ops/s 619.5921 Ops/s $\color{#d91a1a}-2.77\%$
test_creation[device0] 0.2082ms 92.7146μs 10.7858 KOps/s 10.4681 KOps/s $\color{#35bf28}+3.03\%$
test_creation_from_tensor 4.0686ms 95.6449μs 10.4553 KOps/s 10.2115 KOps/s $\color{#35bf28}+2.39\%$
test_add_one[memmap_tensor0] 0.1795ms 5.5595μs 179.8716 KOps/s 182.8270 KOps/s $\color{#d91a1a}-1.62\%$
test_contiguous[memmap_tensor0] 21.7510μs 0.6325μs 1.5810 MOps/s 1.5928 MOps/s $\color{#d91a1a}-0.75\%$
test_stack[memmap_tensor0] 44.0820μs 3.6928μs 270.8004 KOps/s 276.1779 KOps/s $\color{#d91a1a}-1.95\%$
test_memmaptd_index 1.0625ms 0.2645ms 3.7806 KOps/s 3.8825 KOps/s $\color{#d91a1a}-2.62\%$
test_memmaptd_index_astensor 0.5856ms 0.3350ms 2.9847 KOps/s 3.0108 KOps/s $\color{#d91a1a}-0.87\%$
test_memmaptd_index_op 0.9146ms 0.6499ms 1.5386 KOps/s 1.5548 KOps/s $\color{#d91a1a}-1.04\%$
test_serialize_model 0.1275s 0.1223s 8.1759 Ops/s 7.1484 Ops/s $\textbf{\color{#35bf28}+14.37\%}$
test_serialize_model_pickle 0.4453s 0.3901s 2.5633 Ops/s 2.4871 Ops/s $\color{#35bf28}+3.06\%$
test_serialize_weights 0.1967s 0.1341s 7.4597 Ops/s 8.0480 Ops/s $\textbf{\color{#d91a1a}-7.31\%}$
test_serialize_weights_returnearly 0.1862s 0.1699s 5.8855 Ops/s 5.6251 Ops/s $\color{#35bf28}+4.63\%$
test_serialize_weights_pickle 0.4779s 0.4128s 2.4225 Ops/s 2.3751 Ops/s $\color{#35bf28}+2.00\%$
test_serialize_weights_filesystem 0.1475s 0.1435s 6.9699 Ops/s 7.0671 Ops/s $\color{#d91a1a}-1.38\%$
test_serialize_model_filesystem 0.1537s 0.1507s 6.6370 Ops/s 6.5676 Ops/s $\color{#35bf28}+1.06\%$
test_reshape_pytree 95.1870μs 25.4823μs 39.2429 KOps/s 38.5837 KOps/s $\color{#35bf28}+1.71\%$
test_reshape_td 0.1244ms 34.3210μs 29.1366 KOps/s 28.7214 KOps/s $\color{#35bf28}+1.45\%$
test_view_pytree 76.5020μs 25.6813μs 38.9388 KOps/s 38.3590 KOps/s $\color{#35bf28}+1.51\%$
test_view_td 91.1700μs 39.1584μs 25.5373 KOps/s 24.6672 KOps/s $\color{#35bf28}+3.53\%$
test_unbind_pytree 0.1011ms 29.6094μs 33.7730 KOps/s 33.8963 KOps/s $\color{#d91a1a}-0.36\%$
test_unbind_td 0.3591ms 41.2604μs 24.2363 KOps/s 24.1274 KOps/s $\color{#35bf28}+0.45\%$
test_split_pytree 80.4100μs 29.7283μs 33.6380 KOps/s 33.8553 KOps/s $\color{#d91a1a}-0.64\%$
test_split_td 0.5098ms 42.1016μs 23.7521 KOps/s 24.1775 KOps/s $\color{#d91a1a}-1.76\%$
test_add_pytree 78.5760μs 35.3420μs 28.2950 KOps/s 28.4040 KOps/s $\color{#d91a1a}-0.38\%$
test_add_td 0.1335ms 60.2769μs 16.5901 KOps/s 16.7226 KOps/s $\color{#d91a1a}-0.79\%$
test_distributed 0.2660ms 0.1295ms 7.7208 KOps/s 7.4305 KOps/s $\color{#35bf28}+3.91\%$
test_tdmodule 85.8800μs 17.8089μs 56.1516 KOps/s 57.2739 KOps/s $\color{#d91a1a}-1.96\%$
test_tdmodule_dispatch 69.1790μs 38.2876μs 26.1181 KOps/s 27.8117 KOps/s $\textbf{\color{#d91a1a}-6.09\%}$
test_tdseq 49.0720μs 19.9053μs 50.2378 KOps/s 51.2352 KOps/s $\color{#d91a1a}-1.95\%$
test_tdseq_dispatch 71.5230μs 42.3573μs 23.6087 KOps/s 24.6007 KOps/s $\color{#d91a1a}-4.03\%$
test_instantiation_functorch 2.0638ms 1.3264ms 753.9160 Ops/s 737.8022 Ops/s $\color{#35bf28}+2.18\%$
test_instantiation_td 2.4848ms 1.0271ms 973.6399 Ops/s 887.9638 Ops/s $\textbf{\color{#35bf28}+9.65\%}$
test_exec_functorch 0.3898ms 0.1718ms 5.8203 KOps/s 6.1583 KOps/s $\textbf{\color{#d91a1a}-5.49\%}$
test_exec_functional_call 0.2894ms 0.1479ms 6.7600 KOps/s 6.4803 KOps/s $\color{#35bf28}+4.32\%$
test_exec_td 0.2791ms 0.1531ms 6.5320 KOps/s 6.7685 KOps/s $\color{#d91a1a}-3.49\%$
test_exec_td_decorator 0.6850ms 0.2356ms 4.2454 KOps/s 4.2881 KOps/s $\color{#d91a1a}-1.00\%$
test_vmap_mlp_speed[True-True] 0.7499ms 0.5006ms 1.9977 KOps/s 2.0141 KOps/s $\color{#d91a1a}-0.81\%$
test_vmap_mlp_speed[True-False] 0.7471ms 0.4962ms 2.0155 KOps/s 2.0369 KOps/s $\color{#d91a1a}-1.05\%$
test_vmap_mlp_speed[False-True] 0.6883ms 0.3987ms 2.5082 KOps/s 2.4655 KOps/s $\color{#35bf28}+1.73\%$
test_vmap_mlp_speed[False-False] 0.6996ms 0.4007ms 2.4956 KOps/s 2.4697 KOps/s $\color{#35bf28}+1.05\%$
test_vmap_mlp_speed_decorator[True-True] 1.2531ms 0.5851ms 1.7092 KOps/s 1.7126 KOps/s $\color{#d91a1a}-0.20\%$
test_vmap_mlp_speed_decorator[True-False] 0.7661ms 0.5817ms 1.7191 KOps/s 1.7084 KOps/s $\color{#35bf28}+0.62\%$
test_vmap_mlp_speed_decorator[False-True] 0.7909ms 0.4775ms 2.0941 KOps/s 2.0883 KOps/s $\color{#35bf28}+0.28\%$
test_vmap_mlp_speed_decorator[False-False] 0.7379ms 0.4742ms 2.1090 KOps/s 2.0743 KOps/s $\color{#35bf28}+1.68\%$
test_to_module_speed[True] 1.9603ms 1.8133ms 551.4772 Ops/s 546.3937 Ops/s $\color{#35bf28}+0.93\%$
test_to_module_speed[False] 2.3258ms 1.7822ms 561.1191 Ops/s 550.6449 Ops/s $\color{#35bf28}+1.90\%$
test_tc_init 0.1134ms 45.7457μs 21.8600 KOps/s 25.8977 KOps/s $\textbf{\color{#d91a1a}-15.59\%}$
test_tc_init_nested 0.1699ms 91.1193μs 10.9746 KOps/s 12.8109 KOps/s $\textbf{\color{#d91a1a}-14.33\%}$
test_tc_first_layer_tensor 58.7290μs 9.2043μs 108.6445 KOps/s 120.8621 KOps/s $\textbf{\color{#d91a1a}-10.11\%}$
test_tc_first_layer_nontensor 32.8110μs 9.1637μs 109.1260 KOps/s 120.4470 KOps/s $\textbf{\color{#d91a1a}-9.40\%}$
test_tc_second_layer_tensor 41.1960μs 2.8368μs 352.5154 KOps/s 391.6604 KOps/s $\textbf{\color{#d91a1a}-9.99\%}$
test_tc_second_layer_nontensor 51.9470μs 10.2739μs 97.3342 KOps/s 106.2061 KOps/s $\textbf{\color{#d91a1a}-8.35\%}$

Copy link

github-actions bot commented Jul 12, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 78.4920μs 12.5118μs 79.9245 KOps/s 75.7568 KOps/s $\textbf{\color{#35bf28}+5.50\%}$
test_plain_set_stack_nested 0.1035ms 12.6947μs 78.7730 KOps/s 76.0204 KOps/s $\color{#35bf28}+3.62\%$
test_plain_set_nested_inplace 42.1010μs 13.6621μs 73.1953 KOps/s 70.3745 KOps/s $\color{#35bf28}+4.01\%$
test_plain_set_stack_nested_inplace 31.1900μs 13.6561μs 73.2273 KOps/s 70.5577 KOps/s $\color{#35bf28}+3.78\%$
test_items 21.7810μs 4.7603μs 210.0729 KOps/s 209.3041 KOps/s $\color{#35bf28}+0.37\%$
test_items_nested 0.5054ms 0.4024ms 2.4848 KOps/s 2.5246 KOps/s $\color{#d91a1a}-1.57\%$
test_items_nested_locked 0.5935ms 0.4034ms 2.4787 KOps/s 2.4946 KOps/s $\color{#d91a1a}-0.64\%$
test_items_nested_leaf 0.2345ms 85.8378μs 11.6499 KOps/s 11.5173 KOps/s $\color{#35bf28}+1.15\%$
test_items_stack_nested 0.4519ms 0.4013ms 2.4917 KOps/s 2.5135 KOps/s $\color{#d91a1a}-0.86\%$
test_items_stack_nested_leaf 0.2714ms 86.2013μs 11.6008 KOps/s 11.4087 KOps/s $\color{#35bf28}+1.68\%$
test_items_stack_nested_locked 0.4715ms 0.4030ms 2.4813 KOps/s 2.4902 KOps/s $\color{#d91a1a}-0.36\%$
test_keys 17.4410μs 4.3696μs 228.8539 KOps/s 227.8169 KOps/s $\color{#35bf28}+0.46\%$
test_keys_nested 0.1047ms 67.3765μs 14.8420 KOps/s 14.4676 KOps/s $\color{#35bf28}+2.59\%$
test_keys_nested_locked 0.7625ms 73.1832μs 13.6643 KOps/s 13.4555 KOps/s $\color{#35bf28}+1.55\%$
test_keys_nested_leaf 87.1720μs 57.5700μs 17.3702 KOps/s 17.3422 KOps/s $\color{#35bf28}+0.16\%$
test_keys_stack_nested 0.1009ms 67.8143μs 14.7462 KOps/s 14.8907 KOps/s $\color{#d91a1a}-0.97\%$
test_keys_stack_nested_leaf 95.5110μs 56.5338μs 17.6885 KOps/s 17.1243 KOps/s $\color{#35bf28}+3.29\%$
test_keys_stack_nested_locked 0.1269ms 71.6087μs 13.9648 KOps/s 13.5747 KOps/s $\color{#35bf28}+2.87\%$
test_values 7.4733μs 1.7767μs 562.8457 KOps/s 560.1742 KOps/s $\color{#35bf28}+0.48\%$
test_values_nested 54.3910μs 33.8827μs 29.5136 KOps/s 28.7828 KOps/s $\color{#35bf28}+2.54\%$
test_values_nested_locked 70.5320μs 35.7615μs 27.9630 KOps/s 27.1477 KOps/s $\color{#35bf28}+3.00\%$
test_values_nested_leaf 0.1370ms 30.0789μs 33.2459 KOps/s 32.1363 KOps/s $\color{#35bf28}+3.45\%$
test_values_stack_nested 0.1301ms 34.0329μs 29.3833 KOps/s 28.1075 KOps/s $\color{#35bf28}+4.54\%$
test_values_stack_nested_leaf 51.9510μs 30.4465μs 32.8445 KOps/s 31.6257 KOps/s $\color{#35bf28}+3.85\%$
test_values_stack_nested_locked 0.1626ms 36.0327μs 27.7526 KOps/s 26.8645 KOps/s $\color{#35bf28}+3.31\%$
test_membership 1.8350μs 0.5373μs 1.8611 MOps/s 1.8816 MOps/s $\color{#d91a1a}-1.09\%$
test_membership_nested 17.0200μs 2.0761μs 481.6732 KOps/s 484.0788 KOps/s $\color{#d91a1a}-0.50\%$
test_membership_nested_leaf 9.5950μs 2.0149μs 496.3148 KOps/s 498.0365 KOps/s $\color{#d91a1a}-0.35\%$
test_membership_stacked_nested 17.9500μs 2.0907μs 478.2996 KOps/s 477.3000 KOps/s $\color{#35bf28}+0.21\%$
test_membership_stacked_nested_leaf 16.0300μs 2.0880μs 478.9353 KOps/s 483.8790 KOps/s $\color{#d91a1a}-1.02\%$
test_membership_nested_last 17.1000μs 3.0155μs 331.6151 KOps/s 334.0297 KOps/s $\color{#d91a1a}-0.72\%$
test_membership_nested_leaf_last 0.2009ms 2.9681μs 336.9135 KOps/s 331.8967 KOps/s $\color{#35bf28}+1.51\%$
test_membership_stacked_nested_last 72.0410μs 3.4183μs 292.5401 KOps/s 291.0557 KOps/s $\color{#35bf28}+0.51\%$
test_membership_stacked_nested_leaf_last 98.9220μs 3.4236μs 292.0919 KOps/s 292.5499 KOps/s $\color{#d91a1a}-0.16\%$
test_nested_getleaf 33.2410μs 8.0239μs 124.6283 KOps/s 123.9154 KOps/s $\color{#35bf28}+0.58\%$
test_nested_get 19.5210μs 7.5773μs 131.9738 KOps/s 131.9285 KOps/s $\color{#35bf28}+0.03\%$
test_stacked_getleaf 33.8000μs 8.0396μs 124.3845 KOps/s 123.8000 KOps/s $\color{#35bf28}+0.47\%$
test_stacked_get 22.1310μs 7.5415μs 132.5992 KOps/s 132.5021 KOps/s $\color{#35bf28}+0.07\%$
test_nested_getitemleaf 22.8100μs 8.1945μs 122.0336 KOps/s 122.1271 KOps/s $\color{#d91a1a}-0.08\%$
test_nested_getitem 21.6100μs 7.6999μs 129.8716 KOps/s 130.0097 KOps/s $\color{#d91a1a}-0.11\%$
test_stacked_getitemleaf 32.6010μs 8.2287μs 121.5251 KOps/s 122.3144 KOps/s $\color{#d91a1a}-0.65\%$
test_stacked_getitem 23.0610μs 7.6868μs 130.0930 KOps/s 130.3581 KOps/s $\color{#d91a1a}-0.20\%$
test_lock_nested 9.6803ms 0.4293ms 2.3294 KOps/s 2.3519 KOps/s $\color{#d91a1a}-0.96\%$
test_lock_stack_nested 0.4621ms 0.3890ms 2.5707 KOps/s 2.5536 KOps/s $\color{#35bf28}+0.67\%$
test_unlock_nested 0.8300ms 0.3428ms 2.9173 KOps/s 2.8960 KOps/s $\color{#35bf28}+0.73\%$
test_unlock_stack_nested 0.3727ms 0.3105ms 3.2208 KOps/s 3.2075 KOps/s $\color{#35bf28}+0.42\%$
test_flatten_speed 0.3933ms 0.1057ms 9.4618 KOps/s 9.4876 KOps/s $\color{#d91a1a}-0.27\%$
test_unflatten_speed 0.5006ms 0.2921ms 3.4229 KOps/s 3.4340 KOps/s $\color{#d91a1a}-0.32\%$
test_common_ops 0.9497ms 0.5658ms 1.7674 KOps/s 1.7029 KOps/s $\color{#35bf28}+3.79\%$
test_creation 35.6610μs 1.9506μs 512.6646 KOps/s 538.7516 KOps/s $\color{#d91a1a}-4.84\%$
test_creation_empty 29.2600μs 8.7541μs 114.2317 KOps/s 102.0831 KOps/s $\textbf{\color{#35bf28}+11.90\%}$
test_creation_nested_1 30.9100μs 10.6353μs 94.0269 KOps/s 84.8531 KOps/s $\textbf{\color{#35bf28}+10.81\%}$
test_creation_nested_2 28.4600μs 13.0172μs 76.8217 KOps/s 70.9614 KOps/s $\textbf{\color{#35bf28}+8.26\%}$
test_clone 91.4620μs 11.0417μs 90.5658 KOps/s 88.6636 KOps/s $\color{#35bf28}+2.15\%$
test_getitem[int] 25.5300μs 10.2578μs 97.4864 KOps/s 97.3175 KOps/s $\color{#35bf28}+0.17\%$
test_getitem[slice_int] 0.1213ms 20.2695μs 49.3351 KOps/s 49.4720 KOps/s $\color{#d91a1a}-0.28\%$
test_getitem[range] 0.2435ms 37.8344μs 26.4310 KOps/s 26.5985 KOps/s $\color{#d91a1a}-0.63\%$
test_getitem[tuple] 36.4500μs 17.7440μs 56.3571 KOps/s 57.0150 KOps/s $\color{#d91a1a}-1.15\%$
test_getitem[list] 0.2655ms 32.3529μs 30.9091 KOps/s 30.7658 KOps/s $\color{#35bf28}+0.47\%$
test_setitem_dim[int] 42.1310μs 23.6833μs 42.2239 KOps/s 38.2694 KOps/s $\textbf{\color{#35bf28}+10.33\%}$
test_setitem_dim[slice_int] 73.6410μs 45.1344μs 22.1560 KOps/s 20.8670 KOps/s $\textbf{\color{#35bf28}+6.18\%}$
test_setitem_dim[range] 0.1061ms 62.0848μs 16.1070 KOps/s 15.5197 KOps/s $\color{#35bf28}+3.78\%$
test_setitem_dim[tuple] 0.1467ms 39.2671μs 25.4666 KOps/s 24.5486 KOps/s $\color{#35bf28}+3.74\%$
test_setitem 90.1010μs 15.7266μs 63.5866 KOps/s 61.2210 KOps/s $\color{#35bf28}+3.86\%$
test_set 99.3320μs 15.1360μs 66.0676 KOps/s 63.8574 KOps/s $\color{#35bf28}+3.46\%$
test_set_shared 3.0625ms 0.1014ms 9.8581 KOps/s 10.2343 KOps/s $\color{#d91a1a}-3.68\%$
test_update 98.2620μs 17.4419μs 57.3331 KOps/s 52.0450 KOps/s $\textbf{\color{#35bf28}+10.16\%}$
test_update_nested 0.1090ms 23.1195μs 43.2535 KOps/s 41.7386 KOps/s $\color{#35bf28}+3.63\%$
test_update__nested 97.7520μs 20.4971μs 48.7874 KOps/s 46.7755 KOps/s $\color{#35bf28}+4.30\%$
test_set_nested 0.1086ms 15.9571μs 62.6682 KOps/s 59.8245 KOps/s $\color{#35bf28}+4.75\%$
test_set_nested_new 99.1820μs 18.8219μs 53.1297 KOps/s 51.7153 KOps/s $\color{#35bf28}+2.73\%$
test_select 0.1159ms 31.4979μs 31.7481 KOps/s 30.1584 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_select_nested 92.2120μs 52.7568μs 18.9549 KOps/s 19.0010 KOps/s $\color{#d91a1a}-0.24\%$
test_exclude_nested 0.1429ms 72.4181μs 13.8087 KOps/s 13.7605 KOps/s $\color{#35bf28}+0.35\%$
test_empty[True] 0.4063ms 0.2985ms 3.3505 KOps/s 3.3351 KOps/s $\color{#35bf28}+0.46\%$
test_empty[False] 16.9113μs 0.9355μs 1.0690 MOps/s 1.0822 MOps/s $\color{#d91a1a}-1.23\%$
test_to 89.1230μs 59.1288μs 16.9122 KOps/s 17.0539 KOps/s $\color{#d91a1a}-0.83\%$
test_to_nonblocking 0.1840ms 36.8469μs 27.1393 KOps/s 26.4599 KOps/s $\color{#35bf28}+2.57\%$
test_unbind_speed 0.3283ms 0.2638ms 3.7906 KOps/s 3.7153 KOps/s $\color{#35bf28}+2.03\%$
test_unbind_speed_stack0 0.3610ms 0.2634ms 3.7966 KOps/s 3.7670 KOps/s $\color{#35bf28}+0.79\%$
test_unbind_speed_stack1 94.3900ms 0.8021ms 1.2467 KOps/s 1.2368 KOps/s $\color{#35bf28}+0.80\%$
test_split 93.1856ms 1.5563ms 642.5634 Ops/s 627.1538 Ops/s $\color{#35bf28}+2.46\%$
test_chunk 1.5399ms 1.4148ms 706.8164 Ops/s 691.2008 Ops/s $\color{#35bf28}+2.26\%$
test_creation[device0] 0.1896ms 55.4836μs 18.0233 KOps/s 17.2059 KOps/s $\color{#35bf28}+4.75\%$
test_creation_from_tensor 0.1948ms 55.2455μs 18.1010 KOps/s 18.2142 KOps/s $\color{#d91a1a}-0.62\%$
test_add_one[memmap_tensor0] 97.6820μs 6.8050μs 146.9501 KOps/s 145.9794 KOps/s $\color{#35bf28}+0.66\%$
test_contiguous[memmap_tensor0] 11.7800μs 0.6052μs 1.6524 MOps/s 1.7039 MOps/s $\color{#d91a1a}-3.02\%$
test_stack[memmap_tensor0] 32.2910μs 4.3144μs 231.7793 KOps/s 233.3979 KOps/s $\color{#d91a1a}-0.69\%$
test_memmaptd_index 1.0943ms 0.2548ms 3.9250 KOps/s 3.7821 KOps/s $\color{#35bf28}+3.78\%$
test_memmaptd_index_astensor 0.6243ms 0.3188ms 3.1366 KOps/s 3.0705 KOps/s $\color{#35bf28}+2.15\%$
test_memmaptd_index_op 94.1474ms 0.6546ms 1.5277 KOps/s 1.6012 KOps/s $\color{#d91a1a}-4.59\%$
test_serialize_model 94.2990ms 90.2210ms 11.0839 Ops/s 10.3109 Ops/s $\textbf{\color{#35bf28}+7.50\%}$
test_serialize_model_pickle 1.3480s 1.2351s 0.8096 Ops/s 0.7186 Ops/s $\textbf{\color{#35bf28}+12.66\%}$
test_serialize_weights 0.1864s 99.6104ms 10.0391 Ops/s 10.7579 Ops/s $\textbf{\color{#d91a1a}-6.68\%}$
test_serialize_weights_returnearly 0.2958s 79.5005ms 12.5785 Ops/s 13.8962 Ops/s $\textbf{\color{#d91a1a}-9.48\%}$
test_serialize_weights_pickle 1.3523s 1.2487s 0.8008 Ops/s 0.8010 Ops/s $\color{#d91a1a}-0.02\%$
test_reshape_pytree 0.2389ms 25.3708μs 39.4154 KOps/s 39.2822 KOps/s $\color{#35bf28}+0.34\%$
test_reshape_td 99.1620μs 30.7905μs 32.4775 KOps/s 32.8166 KOps/s $\color{#d91a1a}-1.03\%$
test_view_pytree 0.1437ms 25.0124μs 39.9802 KOps/s 39.3006 KOps/s $\color{#35bf28}+1.73\%$
test_view_td 0.2455ms 37.9981μs 26.3171 KOps/s 27.3142 KOps/s $\color{#d91a1a}-3.65\%$
test_unbind_pytree 0.1770ms 32.2356μs 31.0216 KOps/s 32.3281 KOps/s $\color{#d91a1a}-4.04\%$
test_unbind_td 0.6119ms 41.9748μs 23.8238 KOps/s 25.1636 KOps/s $\textbf{\color{#d91a1a}-5.32\%}$
test_split_pytree 63.5910μs 35.4142μs 28.2373 KOps/s 29.3353 KOps/s $\color{#d91a1a}-3.74\%$
test_split_td 0.2516ms 39.8755μs 25.0781 KOps/s 26.6280 KOps/s $\textbf{\color{#d91a1a}-5.82\%}$
test_add_pytree 0.1660ms 37.1620μs 26.9092 KOps/s 26.4577 KOps/s $\color{#35bf28}+1.71\%$
test_add_td 0.2472ms 46.9774μs 21.2868 KOps/s 20.1175 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_distributed 0.2681ms 74.0026μs 13.5130 KOps/s 14.6156 KOps/s $\textbf{\color{#d91a1a}-7.54\%}$
test_tdmodule 59.6210μs 14.2998μs 69.9310 KOps/s 66.7691 KOps/s $\color{#35bf28}+4.74\%$
test_tdmodule_dispatch 45.5110μs 29.6399μs 33.7383 KOps/s 35.3126 KOps/s $\color{#d91a1a}-4.46\%$
test_tdseq 0.1008ms 15.3914μs 64.9715 KOps/s 65.2193 KOps/s $\color{#d91a1a}-0.38\%$
test_tdseq_dispatch 56.8810μs 32.2853μs 30.9739 KOps/s 31.6974 KOps/s $\color{#d91a1a}-2.28\%$
test_instantiation_functorch 1.6262ms 1.3770ms 726.2266 Ops/s 721.3382 Ops/s $\color{#35bf28}+0.68\%$
test_instantiation_td 1.4452ms 0.9739ms 1.0268 KOps/s 914.9997 Ops/s $\textbf{\color{#35bf28}+12.22\%}$
test_exec_functorch 0.2546ms 0.1444ms 6.9269 KOps/s 6.7972 KOps/s $\color{#35bf28}+1.91\%$
test_exec_functional_call 0.3243ms 0.1291ms 7.7443 KOps/s 7.5294 KOps/s $\color{#35bf28}+2.85\%$
test_exec_td 0.1567ms 0.1258ms 7.9484 KOps/s 7.5727 KOps/s $\color{#35bf28}+4.96\%$
test_exec_td_decorator 0.6016ms 0.1965ms 5.0883 KOps/s 4.9476 KOps/s $\color{#35bf28}+2.84\%$
test_vmap_mlp_speed[True-True] 0.7708ms 0.5672ms 1.7631 KOps/s 1.7371 KOps/s $\color{#35bf28}+1.50\%$
test_vmap_mlp_speed[True-False] 0.8013ms 0.5661ms 1.7664 KOps/s 1.7653 KOps/s $\color{#35bf28}+0.06\%$
test_vmap_mlp_speed[False-True] 0.7103ms 0.5149ms 1.9422 KOps/s 1.9887 KOps/s $\color{#d91a1a}-2.34\%$
test_vmap_mlp_speed[False-False] 0.7012ms 0.4990ms 2.0039 KOps/s 1.9940 KOps/s $\color{#35bf28}+0.50\%$
test_vmap_mlp_speed_decorator[True-True] 1.1047ms 0.6408ms 1.5605 KOps/s 1.5450 KOps/s $\color{#35bf28}+1.00\%$
test_vmap_mlp_speed_decorator[True-False] 0.8552ms 0.6410ms 1.5602 KOps/s 1.5576 KOps/s $\color{#35bf28}+0.16\%$
test_vmap_mlp_speed_decorator[False-True] 0.7536ms 0.5583ms 1.7912 KOps/s 1.6886 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7910ms 0.5615ms 1.7808 KOps/s 1.7233 KOps/s $\color{#35bf28}+3.34\%$
test_vmap_transformer_speed[True-True] 8.0670ms 7.5948ms 131.6694 Ops/s 129.3257 Ops/s $\color{#35bf28}+1.81\%$
test_vmap_transformer_speed[True-False] 7.7453ms 7.4968ms 133.3897 Ops/s 129.5591 Ops/s $\color{#35bf28}+2.96\%$
test_vmap_transformer_speed[False-True] 7.6446ms 7.4405ms 134.4004 Ops/s 130.3973 Ops/s $\color{#35bf28}+3.07\%$
test_vmap_transformer_speed[False-False] 8.0195ms 7.5994ms 131.5900 Ops/s 130.5478 Ops/s $\color{#35bf28}+0.80\%$
test_vmap_transformer_speed_decorator[True-True] 18.9022ms 18.5444ms 53.9245 Ops/s 52.3234 Ops/s $\color{#35bf28}+3.06\%$
test_vmap_transformer_speed_decorator[True-False] 19.0490ms 18.6123ms 53.7280 Ops/s 52.4073 Ops/s $\color{#35bf28}+2.52\%$
test_vmap_transformer_speed_decorator[False-True] 18.6452ms 18.3531ms 54.4867 Ops/s 52.9137 Ops/s $\color{#35bf28}+2.97\%$
test_vmap_transformer_speed_decorator[False-False] 19.0367ms 18.3893ms 54.3796 Ops/s 52.9993 Ops/s $\color{#35bf28}+2.60\%$
test_to_module_speed[True] 1.6022ms 1.4842ms 673.7843 Ops/s 655.7122 Ops/s $\color{#35bf28}+2.76\%$
test_to_module_speed[False] 1.5901ms 1.4612ms 684.3782 Ops/s 662.0518 Ops/s $\color{#35bf28}+3.37\%$
test_tc_init 54.0410μs 35.9928μs 27.7833 KOps/s 28.1650 KOps/s $\color{#d91a1a}-1.36\%$
test_tc_init_nested 0.1028ms 71.4880μs 13.9884 KOps/s 14.2762 KOps/s $\color{#d91a1a}-2.02\%$
test_tc_first_layer_tensor 18.8800μs 3.9748μs 251.5840 KOps/s 277.3163 KOps/s $\textbf{\color{#d91a1a}-9.28\%}$
test_tc_first_layer_nontensor 0.1664ms 4.0099μs 249.3857 KOps/s 273.6621 KOps/s $\textbf{\color{#d91a1a}-8.87\%}$
test_tc_second_layer_tensor 45.2413μs 1.2895μs 775.5228 KOps/s 811.1156 KOps/s $\color{#d91a1a}-4.39\%$
test_tc_second_layer_nontensor 21.0410μs 4.5790μs 218.3893 KOps/s 240.6797 KOps/s $\textbf{\color{#d91a1a}-9.26\%}$

vmoens added 4 commits July 12, 2024 10:30
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@vmoens vmoens added the enhancement New feature or request label Jul 15, 2024
[ghstack-poisoned]
@vmoens vmoens merged commit d20b0bd into gh/vmoens/9/base Jul 15, 2024
36 of 37 checks passed
vmoens added a commit that referenced this pull request Jul 15, 2024
ghstack-source-id: ddc0fac60371b3514f4e2e912afabab3c3720bd7
Pull Request resolved: #882
@vmoens vmoens deleted the gh/vmoens/9/head branch July 15, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants