-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Faster lock_/unclock_ when sub-tds are already locked #816
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Jun 14, 2024
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 35.3160μs | 17.4330μs | 57.3623 KOps/s | 61.6198 KOps/s | |
test_plain_set_stack_nested | 37.2200μs | 17.5799μs | 56.8832 KOps/s | 60.8337 KOps/s | |
test_plain_set_nested_inplace | 54.6220μs | 19.8358μs | 50.4139 KOps/s | 54.4270 KOps/s | |
test_plain_set_stack_nested_inplace | 55.7140μs | 19.9636μs | 50.0912 KOps/s | 54.3579 KOps/s | |
test_items | 27.7520μs | 2.5137μs | 397.8251 KOps/s | 395.4885 KOps/s | |
test_items_nested | 0.3396ms | 0.2706ms | 3.6949 KOps/s | 3.7726 KOps/s | |
test_items_nested_locked | 0.4210ms | 0.2719ms | 3.6783 KOps/s | 3.7643 KOps/s | |
test_items_nested_leaf | 0.1503ms | 77.5085μs | 12.9018 KOps/s | 12.9891 KOps/s | |
test_items_stack_nested | 0.4685ms | 0.2738ms | 3.6518 KOps/s | 3.7836 KOps/s | |
test_items_stack_nested_leaf | 0.1286ms | 75.3538μs | 13.2707 KOps/s | 13.3503 KOps/s | |
test_items_stack_nested_locked | 0.4712ms | 0.2713ms | 3.6863 KOps/s | 3.7803 KOps/s | |
test_keys | 18.3650μs | 3.8874μs | 257.2419 KOps/s | 261.5125 KOps/s | |
test_keys_nested | 0.2251ms | 0.1379ms | 7.2493 KOps/s | 7.2499 KOps/s | |
test_keys_nested_locked | 0.7349ms | 0.1423ms | 7.0252 KOps/s | 7.0793 KOps/s | |
test_keys_nested_leaf | 0.2067ms | 0.1161ms | 8.6142 KOps/s | 8.5689 KOps/s | |
test_keys_stack_nested | 0.2194ms | 0.1343ms | 7.4468 KOps/s | 7.4386 KOps/s | |
test_keys_stack_nested_leaf | 0.1996ms | 0.1147ms | 8.7173 KOps/s | 8.7319 KOps/s | |
test_keys_stack_nested_locked | 0.2378ms | 0.1390ms | 7.1963 KOps/s | 7.1654 KOps/s | |
test_values | 5.3260μs | 1.2566μs | 795.8101 KOps/s | 850.5704 KOps/s | |
test_values_nested | 0.1006ms | 50.5153μs | 19.7960 KOps/s | 19.6023 KOps/s | |
test_values_nested_locked | 92.1120μs | 50.6023μs | 19.7619 KOps/s | 19.4821 KOps/s | |
test_values_nested_leaf | 98.4540μs | 46.0780μs | 21.7023 KOps/s | 21.6415 KOps/s | |
test_values_stack_nested | 93.0140μs | 51.7862μs | 19.3102 KOps/s | 19.1361 KOps/s | |
test_values_stack_nested_leaf | 86.4120μs | 45.2400μs | 22.1043 KOps/s | 22.1051 KOps/s | |
test_values_stack_nested_locked | 96.0400μs | 51.5428μs | 19.4014 KOps/s | 19.2844 KOps/s | |
test_membership | 23.2940μs | 1.3223μs | 756.2375 KOps/s | 738.4065 KOps/s | |
test_membership_nested | 19.6570μs | 3.3602μs | 297.5987 KOps/s | 293.0335 KOps/s | |
test_membership_nested_leaf | 21.0100μs | 3.3734μs | 296.4370 KOps/s | 292.2870 KOps/s | |
test_membership_stacked_nested | 60.7030μs | 3.3780μs | 296.0351 KOps/s | 277.0504 KOps/s | |
test_membership_stacked_nested_leaf | 57.6280μs | 3.3752μs | 296.2819 KOps/s | 293.2297 KOps/s | |
test_membership_nested_last | 0.1334ms | 4.2683μs | 234.2850 KOps/s | 238.1806 KOps/s | |
test_membership_nested_leaf_last | 27.7720μs | 4.1608μs | 240.3378 KOps/s | 237.9310 KOps/s | |
test_membership_stacked_nested_last | 52.8890μs | 13.3188μs | 75.0816 KOps/s | 75.4707 KOps/s | |
test_membership_stacked_nested_leaf_last | 34.9450μs | 13.3325μs | 75.0049 KOps/s | 74.9834 KOps/s | |
test_nested_getleaf | 33.6730μs | 10.6599μs | 93.8099 KOps/s | 93.2884 KOps/s | |
test_nested_get | 55.4640μs | 10.0283μs | 99.7176 KOps/s | 98.5474 KOps/s | |
test_stacked_getleaf | 28.7940μs | 10.6528μs | 93.8724 KOps/s | 94.1523 KOps/s | |
test_stacked_get | 63.0390μs | 9.9272μs | 100.7329 KOps/s | 99.4292 KOps/s | |
test_nested_getitemleaf | 58.8200μs | 11.2367μs | 88.9943 KOps/s | 86.4533 KOps/s | |
test_nested_getitem | 30.1370μs | 10.4422μs | 95.7653 KOps/s | 96.3435 KOps/s | |
test_stacked_getitemleaf | 58.9900μs | 11.3336μs | 88.2333 KOps/s | 89.6939 KOps/s | |
test_stacked_getitem | 33.4830μs | 10.3090μs | 97.0025 KOps/s | 97.0327 KOps/s | |
test_lock_nested | 52.2525ms | 0.3924ms | 2.5486 KOps/s | 2.9604 KOps/s | |
test_lock_stack_nested | 0.4424ms | 0.2966ms | 3.3715 KOps/s | 3.3495 KOps/s | |
test_unlock_nested | 0.7324ms | 0.3460ms | 2.8904 KOps/s | 2.8835 KOps/s | |
test_unlock_stack_nested | 0.5460ms | 0.3058ms | 3.2705 KOps/s | 3.2947 KOps/s | |
test_flatten_speed | 0.2028ms | 96.0730μs | 10.4088 KOps/s | 10.5659 KOps/s | |
test_unflatten_speed | 0.6260ms | 0.4102ms | 2.4380 KOps/s | 2.4509 KOps/s | |
test_common_ops | 3.0303ms | 0.7249ms | 1.3795 KOps/s | 1.4787 KOps/s | |
test_creation | 74.7600μs | 1.9589μs | 510.4936 KOps/s | 523.2389 KOps/s | |
test_creation_empty | 47.4890μs | 10.8284μs | 92.3497 KOps/s | 107.0183 KOps/s | |
test_creation_nested_1 | 45.5850μs | 13.6531μs | 73.2436 KOps/s | 82.2006 KOps/s | |
test_creation_nested_2 | 60.0520μs | 16.8020μs | 59.5168 KOps/s | 64.2481 KOps/s | |
test_clone | 75.1810μs | 13.5040μs | 74.0523 KOps/s | 74.7536 KOps/s | |
test_getitem[int] | 50.6460μs | 11.6746μs | 85.6562 KOps/s | 86.5093 KOps/s | |
test_getitem[slice_int] | 54.1410μs | 22.3201μs | 44.8027 KOps/s | 43.9481 KOps/s | |
test_getitem[range] | 96.1500μs | 60.7378μs | 16.4642 KOps/s | 17.4516 KOps/s | |
test_getitem[tuple] | 58.5200μs | 18.9484μs | 52.7749 KOps/s | 52.4264 KOps/s | |
test_getitem[list] | 92.9140μs | 40.4336μs | 24.7319 KOps/s | 24.7398 KOps/s | |
test_setitem_dim[int] | 56.1450μs | 33.5091μs | 29.8427 KOps/s | 30.7110 KOps/s | |
test_setitem_dim[slice_int] | 90.7000μs | 59.8631μs | 16.7048 KOps/s | 16.9218 KOps/s | |
test_setitem_dim[range] | 0.1621ms | 81.8928μs | 12.2111 KOps/s | 12.4585 KOps/s | |
test_setitem_dim[tuple] | 95.4390μs | 49.1396μs | 20.3502 KOps/s | 21.2055 KOps/s | |
test_setitem | 65.5220μs | 20.5454μs | 48.6727 KOps/s | 52.0733 KOps/s | |
test_set | 50.4740μs | 20.1996μs | 49.5060 KOps/s | 53.1369 KOps/s | |
test_set_shared | 1.6814ms | 0.1436ms | 6.9657 KOps/s | 7.1733 KOps/s | |
test_update | 0.1504ms | 22.4034μs | 44.6361 KOps/s | 48.7359 KOps/s | |
test_update_nested | 85.7380μs | 31.1997μs | 32.0516 KOps/s | 35.0010 KOps/s | |
test_update__nested | 88.2060μs | 25.1750μs | 39.7219 KOps/s | 39.6393 KOps/s | |
test_set_nested | 0.1058ms | 21.9124μs | 45.6363 KOps/s | 48.4150 KOps/s | |
test_set_nested_new | 72.0050μs | 26.2882μs | 38.0398 KOps/s | 40.5501 KOps/s | |
test_select | 86.4620μs | 42.0242μs | 23.7958 KOps/s | 25.3316 KOps/s | |
test_select_nested | 0.8926ms | 59.8100μs | 16.7196 KOps/s | 16.7139 KOps/s | |
test_exclude_nested | 0.2283ms | 0.1186ms | 8.4328 KOps/s | 8.4258 KOps/s | |
test_empty[True] | 0.6023ms | 0.3861ms | 2.5901 KOps/s | 2.5397 KOps/s | |
test_empty[False] | 9.7350μs | 1.1309μs | 884.2665 KOps/s | 871.4110 KOps/s | |
test_unbind_speed | 1.6078ms | 0.2555ms | 3.9145 KOps/s | 3.9334 KOps/s | |
test_unbind_speed_stack0 | 0.4358ms | 0.2441ms | 4.0962 KOps/s | 4.0782 KOps/s | |
test_unbind_speed_stack1 | 64.2182ms | 0.7050ms | 1.4184 KOps/s | 1.4020 KOps/s | |
test_split | 66.4568ms | 1.6222ms | 616.4485 Ops/s | 620.4103 Ops/s | |
test_chunk | 67.6416ms | 1.6131ms | 619.9430 Ops/s | 616.6956 Ops/s | |
test_creation[device0] | 4.2427ms | 85.5463μs | 11.6896 KOps/s | 11.9321 KOps/s | |
test_creation_from_tensor | 0.2108ms | 83.7878μs | 11.9349 KOps/s | 11.1795 KOps/s | |
test_add_one[memmap_tensor0] | 55.9850μs | 5.3146μs | 188.1617 KOps/s | 178.9318 KOps/s | |
test_contiguous[memmap_tensor0] | 17.9440μs | 0.6400μs | 1.5625 MOps/s | 1.5561 MOps/s | |
test_stack[memmap_tensor0] | 28.1130μs | 3.6374μs | 274.9238 KOps/s | 269.0608 KOps/s | |
test_memmaptd_index | 1.0878ms | 0.2548ms | 3.9243 KOps/s | 3.6397 KOps/s | |
test_memmaptd_index_astensor | 1.0891ms | 0.3403ms | 2.9382 KOps/s | 2.8662 KOps/s | |
test_memmaptd_index_op | 0.8573ms | 0.6033ms | 1.6576 KOps/s | 1.6171 KOps/s | |
test_serialize_model | 0.1739s | 0.1136s | 8.8022 Ops/s | 8.3622 Ops/s | |
test_serialize_model_pickle | 0.4504s | 0.3793s | 2.6364 Ops/s | 2.5959 Ops/s | |
test_serialize_weights | 0.1635s | 0.1108s | 9.0250 Ops/s | 9.4866 Ops/s | |
test_serialize_weights_returnearly | 0.1337s | 0.1263s | 7.9164 Ops/s | 7.8051 Ops/s | |
test_serialize_weights_pickle | 0.7864s | 0.4999s | 2.0003 Ops/s | 2.3435 Ops/s | |
test_serialize_weights_filesystem | 0.1042s | 94.5626ms | 10.5750 Ops/s | 9.6310 Ops/s | |
test_serialize_model_filesystem | 0.1057s | 95.0779ms | 10.5177 Ops/s | 9.8712 Ops/s | |
test_reshape_pytree | 64.1810μs | 25.6557μs | 38.9777 KOps/s | 39.7740 KOps/s | |
test_reshape_td | 72.9960μs | 34.5494μs | 28.9441 KOps/s | 29.3242 KOps/s | |
test_view_pytree | 73.5680μs | 25.7074μs | 38.8993 KOps/s | 38.9279 KOps/s | |
test_view_td | 84.3480μs | 39.0933μs | 25.5798 KOps/s | 26.1358 KOps/s | |
test_unbind_pytree | 74.7510μs | 29.2836μs | 34.1489 KOps/s | 34.0258 KOps/s | |
test_unbind_td | 0.4406ms | 38.0778μs | 26.2620 KOps/s | 26.2910 KOps/s | |
test_split_pytree | 74.3800μs | 29.2301μs | 34.2113 KOps/s | 34.1685 KOps/s | |
test_split_td | 0.5446ms | 41.6443μs | 24.0129 KOps/s | 24.3745 KOps/s | |
test_add_pytree | 74.1690μs | 34.1372μs | 29.2935 KOps/s | 28.7738 KOps/s | |
test_add_td | 0.1694ms | 54.3237μs | 18.4082 KOps/s | 19.2176 KOps/s | |
test_distributed | 0.2686ms | 0.1028ms | 9.7316 KOps/s | 9.5817 KOps/s | |
test_tdmodule | 89.1180μs | 18.0129μs | 55.5156 KOps/s | 58.2319 KOps/s | |
test_tdmodule_dispatch | 59.3210μs | 36.1304μs | 27.6775 KOps/s | 29.8448 KOps/s | |
test_tdseq | 35.7470μs | 20.9245μs | 47.7910 KOps/s | 49.4033 KOps/s | |
test_tdseq_dispatch | 67.3360μs | 40.7489μs | 24.5406 KOps/s | 25.5801 KOps/s | |
test_instantiation_functorch | 1.5976ms | 1.3037ms | 767.0422 Ops/s | 767.1832 Ops/s | |
test_instantiation_td | 2.3243ms | 1.0273ms | 973.4655 Ops/s | 1.0042 KOps/s | |
test_exec_functorch | 0.2779ms | 0.1622ms | 6.1645 KOps/s | 6.3162 KOps/s | |
test_exec_functional_call | 0.4981ms | 0.1554ms | 6.4338 KOps/s | 6.7745 KOps/s | |
test_exec_td | 0.2373ms | 0.1482ms | 6.7482 KOps/s | 6.9466 KOps/s | |
test_exec_td_decorator | 0.5520ms | 0.2221ms | 4.5015 KOps/s | 4.5861 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.7659ms | 0.4850ms | 2.0620 KOps/s | 2.0964 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7231ms | 0.4807ms | 2.0802 KOps/s | 2.0796 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.9332ms | 0.3980ms | 2.5127 KOps/s | 2.5547 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6311ms | 0.3932ms | 2.5433 KOps/s | 2.5647 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.1630ms | 0.5559ms | 1.7988 KOps/s | 1.8217 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.0417ms | 0.5561ms | 1.7983 KOps/s | 1.8131 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 73.6829ms | 0.4917ms | 2.0336 KOps/s | 2.2181 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.8196ms | 0.4584ms | 2.1816 KOps/s | 2.2163 KOps/s | |
test_to_module_speed[True] | 2.3247ms | 1.6853ms | 593.3570 Ops/s | 596.8026 Ops/s | |
test_to_module_speed[False] | 2.5466ms | 1.6524ms | 605.1683 Ops/s | 615.9915 Ops/s | |
test_tc_init | 60.7440μs | 29.0693μs | 34.4006 KOps/s | 38.5730 KOps/s | |
test_tc_init_nested | 0.1081ms | 60.4431μs | 16.5445 KOps/s | 18.0232 KOps/s | |
test_tc_first_layer_tensor | 5.7980μs | 0.6875μs | 1.4546 MOps/s | 1.4915 MOps/s | |
test_tc_first_layer_nontensor | 3.7500μs | 0.6625μs | 1.5095 MOps/s | 1.5271 MOps/s | |
test_tc_second_layer_tensor | 27.0210μs | 1.8430μs | 542.6058 KOps/s | 531.4231 KOps/s | |
test_tc_second_layer_nontensor | 11.5147μs | 1.4926μs | 669.9748 KOps/s | 655.0214 KOps/s | |
test_unbind | 84.0940ms | 7.1903ms | 139.0756 Ops/s | 138.9569 Ops/s | |
test_full_like | 15.0641ms | 11.0905ms | 90.1669 Ops/s | 82.4350 Ops/s | |
test_zeros_like | 13.6548ms | 6.2271ms | 160.5880 Ops/s | 158.0792 Ops/s | |
test_ones_like | 12.5541ms | 6.4154ms | 155.8758 Ops/s | 156.6897 Ops/s | |
test_clone | 12.9213ms | 7.8849ms | 126.8255 Ops/s | 122.1047 Ops/s | |
test_squeeze | 59.2410μs | 14.1720μs | 70.5616 KOps/s | 69.4176 KOps/s | |
test_unsqueeze | 0.1111ms | 59.2101μs | 16.8890 KOps/s | 16.3427 KOps/s | |
test_split | 0.1895ms | 0.1120ms | 8.9266 KOps/s | 8.8106 KOps/s | |
test_permute | 0.2098ms | 0.1265ms | 7.9081 KOps/s | 7.9199 KOps/s | |
test_stack | 30.7092ms | 22.3760ms | 44.6908 Ops/s | 43.2088 Ops/s | |
test_cat | 27.0150ms | 21.8183ms | 45.8331 Ops/s | 43.2972 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 36.4810μs | 13.9177μs | 71.8511 KOps/s | 78.4397 KOps/s | |
test_plain_set_stack_nested | 27.0200μs | 14.0896μs | 70.9741 KOps/s | 77.4145 KOps/s | |
test_plain_set_nested_inplace | 41.5510μs | 15.2855μs | 65.4214 KOps/s | 71.1812 KOps/s | |
test_plain_set_stack_nested_inplace | 45.0310μs | 15.3874μs | 64.9884 KOps/s | 70.6457 KOps/s | |
test_items | 20.2800μs | 4.6680μs | 214.2257 KOps/s | 213.3319 KOps/s | |
test_items_nested | 0.3849ms | 0.3386ms | 2.9533 KOps/s | 2.9640 KOps/s | |
test_items_nested_locked | 0.4134ms | 0.3580ms | 2.7934 KOps/s | 2.9068 KOps/s | |
test_items_nested_leaf | 0.1029ms | 83.5317μs | 11.9715 KOps/s | 12.1701 KOps/s | |
test_items_stack_nested | 0.4015ms | 0.3444ms | 2.9036 KOps/s | 2.9231 KOps/s | |
test_items_stack_nested_leaf | 0.1046ms | 83.9269μs | 11.9151 KOps/s | 12.0454 KOps/s | |
test_items_stack_nested_locked | 0.3985ms | 0.3479ms | 2.8745 KOps/s | 2.9457 KOps/s | |
test_keys | 27.1110μs | 4.3281μs | 231.0463 KOps/s | 231.6679 KOps/s | |
test_keys_nested | 97.5120μs | 67.5173μs | 14.8110 KOps/s | 15.0321 KOps/s | |
test_keys_nested_locked | 2.1444ms | 72.8003μs | 13.7362 KOps/s | 13.9657 KOps/s | |
test_keys_nested_leaf | 88.3020μs | 57.9070μs | 17.2691 KOps/s | 17.5004 KOps/s | |
test_keys_stack_nested | 98.3230μs | 66.9849μs | 14.9287 KOps/s | 15.1000 KOps/s | |
test_keys_stack_nested_leaf | 80.9520μs | 57.7424μs | 17.3183 KOps/s | 17.4902 KOps/s | |
test_keys_stack_nested_locked | 96.6020μs | 71.2927μs | 14.0267 KOps/s | 14.0269 KOps/s | |
test_values | 8.4537μs | 1.8081μs | 553.0555 KOps/s | 551.6554 KOps/s | |
test_values_nested | 76.6520μs | 35.2021μs | 28.4074 KOps/s | 28.8201 KOps/s | |
test_values_nested_locked | 60.0520μs | 37.1427μs | 26.9232 KOps/s | 27.2735 KOps/s | |
test_values_nested_leaf | 52.7510μs | 31.2251μs | 32.0256 KOps/s | 32.4609 KOps/s | |
test_values_stack_nested | 65.2420μs | 35.3294μs | 28.3051 KOps/s | 28.1513 KOps/s | |
test_values_stack_nested_leaf | 51.0810μs | 31.3850μs | 31.8624 KOps/s | 31.7536 KOps/s | |
test_values_stack_nested_locked | 62.8410μs | 37.1482μs | 26.9192 KOps/s | 26.6749 KOps/s | |
test_membership | 35.6500μs | 0.8329μs | 1.2007 MOps/s | 1.1891 MOps/s | |
test_membership_nested | 23.4800μs | 2.5616μs | 390.3767 KOps/s | 392.1265 KOps/s | |
test_membership_nested_leaf | 35.0910μs | 2.5352μs | 394.4439 KOps/s | 387.8286 KOps/s | |
test_membership_stacked_nested | 21.4600μs | 2.5638μs | 390.0434 KOps/s | 389.0416 KOps/s | |
test_membership_stacked_nested_leaf | 15.1510μs | 2.5608μs | 390.5057 KOps/s | 391.0215 KOps/s | |
test_membership_nested_last | 36.6910μs | 3.0622μs | 326.5669 KOps/s | 325.8502 KOps/s | |
test_membership_nested_leaf_last | 20.6900μs | 3.0836μs | 324.2915 KOps/s | 325.7912 KOps/s | |
test_membership_stacked_nested_last | 44.8910μs | 3.1276μs | 319.7312 KOps/s | 256.2564 KOps/s | |
test_membership_stacked_nested_leaf_last | 21.4100μs | 3.1125μs | 321.2847 KOps/s | 258.1818 KOps/s | |
test_nested_getleaf | 40.8510μs | 8.4285μs | 118.6453 KOps/s | 119.7160 KOps/s | |
test_nested_get | 23.1610μs | 7.9351μs | 126.0226 KOps/s | 127.5925 KOps/s | |
test_stacked_getleaf | 26.9010μs | 8.4274μs | 118.6603 KOps/s | 118.9262 KOps/s | |
test_stacked_get | 38.8610μs | 7.9332μs | 126.0520 KOps/s | 126.6363 KOps/s | |
test_nested_getitemleaf | 24.9310μs | 8.5799μs | 116.5510 KOps/s | 117.3601 KOps/s | |
test_nested_getitem | 30.3310μs | 8.1041μs | 123.3950 KOps/s | 124.4060 KOps/s | |
test_stacked_getitemleaf | 35.2920μs | 8.5714μs | 116.6673 KOps/s | 116.5532 KOps/s | |
test_stacked_getitem | 25.7700μs | 8.0702μs | 123.9121 KOps/s | 123.6761 KOps/s | |
test_lock_nested | 58.9086ms | 0.4060ms | 2.4630 KOps/s | 2.4927 KOps/s | |
test_lock_stack_nested | 0.3550ms | 0.3034ms | 3.2963 KOps/s | 3.3432 KOps/s | |
test_unlock_nested | 60.9140ms | 0.4109ms | 2.4338 KOps/s | 2.4542 KOps/s | |
test_unlock_stack_nested | 0.3505ms | 0.3114ms | 3.2110 KOps/s | 3.2322 KOps/s | |
test_flatten_speed | 0.2958ms | 0.1007ms | 9.9261 KOps/s | 9.9233 KOps/s | |
test_unflatten_speed | 0.3222ms | 0.2917ms | 3.4283 KOps/s | 3.4224 KOps/s | |
test_common_ops | 1.0883ms | 0.6218ms | 1.6083 KOps/s | 1.6906 KOps/s | |
test_creation | 27.0310μs | 1.6426μs | 608.7851 KOps/s | 613.3372 KOps/s | |
test_creation_empty | 24.3210μs | 10.7467μs | 93.0516 KOps/s | 118.1474 KOps/s | |
test_creation_nested_1 | 40.4210μs | 12.4170μs | 80.5349 KOps/s | 98.0529 KOps/s | |
test_creation_nested_2 | 29.8110μs | 14.5464μs | 68.7456 KOps/s | 79.3599 KOps/s | |
test_clone | 69.5910μs | 11.8425μs | 84.4414 KOps/s | 84.1937 KOps/s | |
test_getitem[int] | 30.5110μs | 11.3660μs | 87.9813 KOps/s | 87.8079 KOps/s | |
test_getitem[slice_int] | 55.5320μs | 21.4819μs | 46.5508 KOps/s | 42.4434 KOps/s | |
test_getitem[range] | 69.2020μs | 49.9852μs | 20.0059 KOps/s | 20.1339 KOps/s | |
test_getitem[tuple] | 43.1510μs | 19.4222μs | 51.4875 KOps/s | 51.4988 KOps/s | |
test_getitem[list] | 0.1272ms | 34.8218μs | 28.7176 KOps/s | 28.0672 KOps/s | |
test_setitem_dim[int] | 49.1510μs | 32.5975μs | 30.6772 KOps/s | 31.6787 KOps/s | |
test_setitem_dim[slice_int] | 77.4010μs | 52.6640μs | 18.9883 KOps/s | 18.4840 KOps/s | |
test_setitem_dim[range] | 94.5820μs | 72.5850μs | 13.7770 KOps/s | 13.5893 KOps/s | |
test_setitem_dim[tuple] | 63.7110μs | 46.0098μs | 21.7345 KOps/s | 20.9183 KOps/s | |
test_setitem | 42.9010μs | 18.2973μs | 54.6529 KOps/s | 56.4646 KOps/s | |
test_set | 59.9520μs | 17.5518μs | 56.9742 KOps/s | 58.8458 KOps/s | |
test_set_shared | 1.3331ms | 99.2628μs | 10.0743 KOps/s | 9.9754 KOps/s | |
test_update | 86.0220μs | 21.1315μs | 47.3228 KOps/s | 52.9944 KOps/s | |
test_update_nested | 74.3520μs | 25.8479μs | 38.6879 KOps/s | 39.9466 KOps/s | |
test_update__nested | 60.7120μs | 22.7838μs | 43.8909 KOps/s | 44.1024 KOps/s | |
test_set_nested | 68.3210μs | 18.3292μs | 54.5577 KOps/s | 54.5015 KOps/s | |
test_set_nested_new | 61.7810μs | 21.8255μs | 45.8180 KOps/s | 47.1293 KOps/s | |
test_select | 68.2720μs | 35.2609μs | 28.3600 KOps/s | 27.9896 KOps/s | |
test_select_nested | 0.4820ms | 53.7361μs | 18.6094 KOps/s | 18.2997 KOps/s | |
test_exclude_nested | 0.1352ms | 0.1083ms | 9.2298 KOps/s | 8.9532 KOps/s | |
test_empty[True] | 0.3751ms | 0.3480ms | 2.8733 KOps/s | 2.8761 KOps/s | |
test_empty[False] | 2.8740μs | 0.9148μs | 1.0932 MOps/s | 1.0714 MOps/s | |
test_to | 0.1048ms | 77.0819μs | 12.9732 KOps/s | 12.4816 KOps/s | |
test_to_nonblocking | 98.1120μs | 61.7724μs | 16.1885 KOps/s | 15.1592 KOps/s | |
test_unbind_speed | 1.3929ms | 0.2644ms | 3.7821 KOps/s | 3.7912 KOps/s | |
test_unbind_speed_stack0 | 0.3475ms | 0.2632ms | 3.7999 KOps/s | 3.8089 KOps/s | |
test_unbind_speed_stack1 | 76.4737ms | 0.8030ms | 1.2454 KOps/s | 1.2364 KOps/s | |
test_split | 76.7957ms | 1.7247ms | 579.8050 Ops/s | 560.2051 Ops/s | |
test_chunk | 76.7287ms | 1.7192ms | 581.6600 Ops/s | 601.8766 Ops/s | |
test_creation[device0] | 0.1188ms | 59.8376μs | 16.7119 KOps/s | 16.1544 KOps/s | |
test_creation_from_tensor | 0.1292ms | 53.9506μs | 18.5355 KOps/s | 17.2421 KOps/s | |
test_add_one[memmap_tensor0] | 83.9920μs | 7.0275μs | 142.2991 KOps/s | 143.6064 KOps/s | |
test_contiguous[memmap_tensor0] | 24.5300μs | 0.7306μs | 1.3687 MOps/s | 1.4819 MOps/s | |
test_stack[memmap_tensor0] | 25.7110μs | 5.2897μs | 189.0464 KOps/s | 196.2193 KOps/s | |
test_memmaptd_index | 1.1726ms | 0.3081ms | 3.2459 KOps/s | 3.3410 KOps/s | |
test_memmaptd_index_astensor | 0.6411ms | 0.3783ms | 2.6432 KOps/s | 2.6938 KOps/s | |
test_memmaptd_index_op | 1.2339ms | 0.7168ms | 1.3951 KOps/s | 1.4882 KOps/s | |
test_serialize_model | 0.1829s | 0.1111s | 8.9977 Ops/s | 9.4719 Ops/s | |
test_serialize_model_pickle | 1.3492s | 1.2352s | 0.8096 Ops/s | 0.8063 Ops/s | |
test_serialize_weights | 0.1805s | 0.1087s | 9.2024 Ops/s | 8.7308 Ops/s | |
test_serialize_weights_returnearly | 0.2893s | 0.1054s | 9.4904 Ops/s | 10.3213 Ops/s | |
test_serialize_weights_pickle | 1.3540s | 1.2481s | 0.8012 Ops/s | 0.8090 Ops/s | |
test_reshape_pytree | 62.4510μs | 26.3339μs | 37.9738 KOps/s | 37.8911 KOps/s | |
test_reshape_td | 55.7210μs | 31.4901μs | 31.7560 KOps/s | 30.8431 KOps/s | |
test_view_pytree | 0.2226ms | 26.0499μs | 38.3879 KOps/s | 38.3877 KOps/s | |
test_view_td | 62.9510μs | 35.6965μs | 28.0139 KOps/s | 27.1950 KOps/s | |
test_unbind_pytree | 0.2277ms | 32.1874μs | 31.0681 KOps/s | 31.5692 KOps/s | |
test_unbind_td | 0.4042ms | 40.2922μs | 24.8187 KOps/s | 24.1202 KOps/s | |
test_split_pytree | 0.2568ms | 35.5976μs | 28.0918 KOps/s | 28.0195 KOps/s | |
test_split_td | 0.5216ms | 42.4498μs | 23.5572 KOps/s | 24.8017 KOps/s | |
test_add_pytree | 0.2639ms | 38.5482μs | 25.9416 KOps/s | 26.0794 KOps/s | |
test_add_td | 86.3220μs | 54.1325μs | 18.4732 KOps/s | 19.2682 KOps/s | |
test_distributed | 0.2494ms | 66.9170μs | 14.9439 KOps/s | 15.2714 KOps/s | |
test_tdmodule | 0.1267ms | 16.5538μs | 60.4091 KOps/s | 67.8243 KOps/s | |
test_tdmodule_dispatch | 46.8910μs | 32.0426μs | 31.2084 KOps/s | 34.4132 KOps/s | |
test_tdseq | 44.7210μs | 17.9228μs | 55.7950 KOps/s | 59.8802 KOps/s | |
test_tdseq_dispatch | 49.9510μs | 34.4276μs | 29.0464 KOps/s | 30.8078 KOps/s | |
test_instantiation_functorch | 1.8306ms | 1.5600ms | 641.0135 Ops/s | 654.4163 Ops/s | |
test_instantiation_td | 1.5695ms | 1.0448ms | 957.1328 Ops/s | 871.5047 Ops/s | |
test_exec_functorch | 0.2040ms | 0.1501ms | 6.6633 KOps/s | 6.6879 KOps/s | |
test_exec_functional_call | 0.3543ms | 0.1368ms | 7.3083 KOps/s | 7.3586 KOps/s | |
test_exec_td | 0.1829ms | 0.1352ms | 7.3959 KOps/s | 7.4248 KOps/s | |
test_exec_td_decorator | 0.7681ms | 0.2083ms | 4.7998 KOps/s | 4.7627 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.2742ms | 0.5850ms | 1.7093 KOps/s | 1.7369 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.8287ms | 0.5903ms | 1.6941 KOps/s | 1.7436 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.0304ms | 0.5239ms | 1.9087 KOps/s | 1.9680 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.9874ms | 0.5252ms | 1.9039 KOps/s | 1.9795 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.0547ms | 0.6602ms | 1.5147 KOps/s | 1.5697 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.7272ms | 0.6415ms | 1.5588 KOps/s | 1.5768 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7458ms | 0.5643ms | 1.7722 KOps/s | 1.7742 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.6746ms | 0.5634ms | 1.7750 KOps/s | 1.7690 KOps/s | |
test_vmap_transformer_speed[True-True] | 7.5780ms | 7.4922ms | 133.4719 Ops/s | 133.6101 Ops/s | |
test_vmap_transformer_speed[True-False] | 7.5148ms | 7.4610ms | 134.0300 Ops/s | 133.9863 Ops/s | |
test_vmap_transformer_speed[False-True] | 8.3322ms | 7.7318ms | 129.3363 Ops/s | 135.1195 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.2238ms | 7.5665ms | 132.1613 Ops/s | 135.0375 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.2067ms | 18.5875ms | 53.7997 Ops/s | 55.0891 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.1910ms | 18.5641ms | 53.8674 Ops/s | 55.2064 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.0537ms | 18.5361ms | 53.9487 Ops/s | 55.5535 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.3136ms | 18.5350ms | 53.9520 Ops/s | 55.5200 Ops/s | |
test_to_module_speed[True] | 3.0124ms | 1.5801ms | 632.8640 Ops/s | 649.9024 Ops/s | |
test_to_module_speed[False] | 2.0327ms | 1.5369ms | 650.6762 Ops/s | 650.4422 Ops/s | |
test_tc_init | 0.1712ms | 29.6085μs | 33.7741 KOps/s | 39.9654 KOps/s | |
test_tc_init_nested | 0.1947ms | 64.5511μs | 15.4916 KOps/s | 18.0664 KOps/s | |
test_tc_first_layer_tensor | 3.2818μs | 0.3619μs | 2.7632 MOps/s | 2.7878 MOps/s | |
test_tc_first_layer_nontensor | 10.6518μs | 0.3921μs | 2.5506 MOps/s | 2.5242 MOps/s | |
test_tc_second_layer_tensor | 25.5326μs | 0.9722μs | 1.0286 MOps/s | 938.7142 KOps/s | |
test_tc_second_layer_nontensor | 21.7538μs | 0.8411μs | 1.1890 MOps/s | 1.2252 MOps/s | |
test_unbind | 0.1126s | 6.9687ms | 143.4989 Ops/s | 157.2436 Ops/s | |
test_full_like | 11.7878ms | 11.1462ms | 89.7165 Ops/s | 76.4800 Ops/s | |
test_zeros_like | 8.1528ms | 7.8247ms | 127.8011 Ops/s | 127.3165 Ops/s | |
test_ones_like | 8.4440ms | 7.8740ms | 127.0007 Ops/s | 126.3744 Ops/s | |
test_clone | 9.3933ms | 9.2096ms | 108.5826 Ops/s | 108.2504 Ops/s | |
test_squeeze | 69.0610μs | 10.9372μs | 91.4310 KOps/s | 91.2922 KOps/s | |
test_unsqueeze | 97.9430μs | 52.8971μs | 18.9046 KOps/s | 18.8525 KOps/s | |
test_split | 0.1563ms | 99.2463μs | 10.0759 KOps/s | 10.1357 KOps/s | |
test_permute | 0.1737ms | 0.1099ms | 9.0990 KOps/s | 8.9970 KOps/s | |
test_stack | 27.2443ms | 26.6603ms | 37.5089 Ops/s | 37.3433 Ops/s | |
test_cat | 26.8743ms | 26.5956ms | 37.6002 Ops/s | 37.5476 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Performance
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Benchmark:
Before:
After: