Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] map_iter #847

Merged
merged 7 commits into from
Jul 3, 2024
Merged

[Feature] map_iter #847

merged 7 commits into from
Jul 3, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 2, 2024

Introduces map_iter, which can be used to iterate over a large dataset in a dataloader-like fashion.

TODO:

  • shuffle and order iter
  • infinite iterable? Maybe a follow-up PR - should be able to come about on use side
  • Tests

cc @shagunsodhani

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2024
Copy link

github-actions bot commented Jul 2, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}30$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 63.7520μs 12.5146μs 79.9064 KOps/s 76.0669 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_plain_set_stack_nested 25.3610μs 12.6179μs 79.2527 KOps/s 75.1401 KOps/s $\textbf{\color{#35bf28}+5.47\%}$
test_plain_set_nested_inplace 36.1410μs 14.0451μs 71.1993 KOps/s 68.9441 KOps/s $\color{#35bf28}+3.27\%$
test_plain_set_stack_nested_inplace 37.2010μs 13.8213μs 72.3522 KOps/s 69.0457 KOps/s $\color{#35bf28}+4.79\%$
test_items 19.1300μs 4.6989μs 212.8159 KOps/s 210.6254 KOps/s $\color{#35bf28}+1.04\%$
test_items_nested 0.4062ms 0.3410ms 2.9328 KOps/s 2.9058 KOps/s $\color{#35bf28}+0.93\%$
test_items_nested_locked 0.4149ms 0.3494ms 2.8617 KOps/s 2.9241 KOps/s $\color{#d91a1a}-2.13\%$
test_items_nested_leaf 0.1046ms 82.9549μs 12.0547 KOps/s 12.0077 KOps/s $\color{#35bf28}+0.39\%$
test_items_stack_nested 0.4094ms 0.3491ms 2.8643 KOps/s 2.9052 KOps/s $\color{#d91a1a}-1.41\%$
test_items_stack_nested_leaf 0.1158ms 85.6735μs 11.6722 KOps/s 11.8382 KOps/s $\color{#d91a1a}-1.40\%$
test_items_stack_nested_locked 0.4097ms 0.3424ms 2.9205 KOps/s 2.9109 KOps/s $\color{#35bf28}+0.33\%$
test_keys 25.3310μs 4.4011μs 227.2155 KOps/s 226.7923 KOps/s $\color{#35bf28}+0.19\%$
test_keys_nested 0.1000ms 71.6102μs 13.9645 KOps/s 14.2691 KOps/s $\color{#d91a1a}-2.13\%$
test_keys_nested_locked 2.6041ms 77.3463μs 12.9289 KOps/s 13.0961 KOps/s $\color{#d91a1a}-1.28\%$
test_keys_nested_leaf 89.7320μs 62.5237μs 15.9939 KOps/s 16.4926 KOps/s $\color{#d91a1a}-3.02\%$
test_keys_stack_nested 0.1000ms 71.4796μs 13.9900 KOps/s 14.2101 KOps/s $\color{#d91a1a}-1.55\%$
test_keys_stack_nested_leaf 87.7510μs 62.4593μs 16.0104 KOps/s 16.8350 KOps/s $\color{#d91a1a}-4.90\%$
test_keys_stack_nested_locked 0.1036ms 76.8457μs 13.0131 KOps/s 13.1908 KOps/s $\color{#d91a1a}-1.35\%$
test_values 14.3837μs 1.8092μs 552.7361 KOps/s 552.4149 KOps/s $\color{#35bf28}+0.06\%$
test_values_nested 61.7810μs 35.9679μs 27.8026 KOps/s 28.2553 KOps/s $\color{#d91a1a}-1.60\%$
test_values_nested_locked 67.7010μs 37.7581μs 26.4844 KOps/s 26.8193 KOps/s $\color{#d91a1a}-1.25\%$
test_values_nested_leaf 47.6710μs 32.0375μs 31.2134 KOps/s 31.6756 KOps/s $\color{#d91a1a}-1.46\%$
test_values_stack_nested 62.3400μs 36.6881μs 27.2568 KOps/s 27.6861 KOps/s $\color{#d91a1a}-1.55\%$
test_values_stack_nested_leaf 63.8910μs 32.8471μs 30.4441 KOps/s 31.1335 KOps/s $\color{#d91a1a}-2.21\%$
test_values_stack_nested_locked 64.2020μs 38.7178μs 25.8279 KOps/s 26.3333 KOps/s $\color{#d91a1a}-1.92\%$
test_membership 3.7786μs 0.7393μs 1.3526 MOps/s 1.4235 MOps/s $\color{#d91a1a}-4.98\%$
test_membership_nested 20.6800μs 2.6773μs 373.5057 KOps/s 382.4920 KOps/s $\color{#d91a1a}-2.35\%$
test_membership_nested_leaf 29.9300μs 2.6577μs 376.2599 KOps/s 383.0281 KOps/s $\color{#d91a1a}-1.77\%$
test_membership_stacked_nested 24.3110μs 2.6088μs 383.3165 KOps/s 380.8919 KOps/s $\color{#35bf28}+0.64\%$
test_membership_stacked_nested_leaf 22.9410μs 2.6455μs 377.9947 KOps/s 385.4738 KOps/s $\color{#d91a1a}-1.94\%$
test_membership_nested_last 46.7610μs 3.2324μs 309.3696 KOps/s 320.6686 KOps/s $\color{#d91a1a}-3.52\%$
test_membership_nested_leaf_last 33.2000μs 3.1878μs 313.6922 KOps/s 319.3595 KOps/s $\color{#d91a1a}-1.77\%$
test_membership_stacked_nested_last 24.3210μs 3.1711μs 315.3503 KOps/s 316.4929 KOps/s $\color{#d91a1a}-0.36\%$
test_membership_stacked_nested_leaf_last 26.8110μs 3.1912μs 313.3631 KOps/s 318.4042 KOps/s $\color{#d91a1a}-1.58\%$
test_nested_getleaf 37.6710μs 8.4815μs 117.9041 KOps/s 119.8530 KOps/s $\color{#d91a1a}-1.63\%$
test_nested_get 29.0000μs 7.9026μs 126.5405 KOps/s 127.4950 KOps/s $\color{#d91a1a}-0.75\%$
test_stacked_getleaf 25.3710μs 8.4217μs 118.7415 KOps/s 119.7226 KOps/s $\color{#d91a1a}-0.82\%$
test_stacked_get 37.0810μs 7.9080μs 126.4542 KOps/s 126.8240 KOps/s $\color{#d91a1a}-0.29\%$
test_nested_getitemleaf 23.8790μs 8.6456μs 115.6664 KOps/s 116.7450 KOps/s $\color{#d91a1a}-0.92\%$
test_nested_getitem 28.8110μs 8.1548μs 122.6272 KOps/s 124.4115 KOps/s $\color{#d91a1a}-1.43\%$
test_stacked_getitemleaf 34.2800μs 8.5805μs 116.5433 KOps/s 116.7925 KOps/s $\color{#d91a1a}-0.21\%$
test_stacked_getitem 24.4310μs 8.0669μs 123.9638 KOps/s 123.6757 KOps/s $\color{#35bf28}+0.23\%$
test_lock_nested 58.7218ms 0.4012ms 2.4927 KOps/s 2.4865 KOps/s $\color{#35bf28}+0.25\%$
test_lock_stack_nested 0.3480ms 0.2957ms 3.3817 KOps/s 3.2867 KOps/s $\color{#35bf28}+2.89\%$
test_unlock_nested 60.9149ms 0.4033ms 2.4794 KOps/s 2.4558 KOps/s $\color{#35bf28}+0.96\%$
test_unlock_stack_nested 0.3592ms 0.3047ms 3.2816 KOps/s 3.2033 KOps/s $\color{#35bf28}+2.45\%$
test_flatten_speed 0.3762ms 0.1029ms 9.7200 KOps/s 9.8773 KOps/s $\color{#d91a1a}-1.59\%$
test_unflatten_speed 0.3486ms 0.2945ms 3.3960 KOps/s 3.4166 KOps/s $\color{#d91a1a}-0.60\%$
test_common_ops 1.0437ms 0.5784ms 1.7289 KOps/s 1.6458 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_creation 27.0500μs 1.6613μs 601.9400 KOps/s 616.1116 KOps/s $\color{#d91a1a}-2.30\%$
test_creation_empty 26.3800μs 8.0658μs 123.9803 KOps/s 106.8967 KOps/s $\textbf{\color{#35bf28}+15.98\%}$
test_creation_nested_1 26.6410μs 9.8336μs 101.6925 KOps/s 89.8939 KOps/s $\textbf{\color{#35bf28}+13.13\%}$
test_creation_nested_2 40.2510μs 12.0232μs 83.1725 KOps/s 75.2748 KOps/s $\textbf{\color{#35bf28}+10.49\%}$
test_clone 66.9010μs 11.6848μs 85.5810 KOps/s 83.6803 KOps/s $\color{#35bf28}+2.27\%$
test_getitem[int] 26.7310μs 10.7408μs 93.1033 KOps/s 92.8184 KOps/s $\color{#35bf28}+0.31\%$
test_getitem[slice_int] 65.9020μs 20.8210μs 48.0283 KOps/s 46.7376 KOps/s $\color{#35bf28}+2.76\%$
test_getitem[range] 67.6010μs 48.1168μs 20.7828 KOps/s 19.1824 KOps/s $\textbf{\color{#35bf28}+8.34\%}$
test_getitem[tuple] 41.5910μs 18.5966μs 53.7732 KOps/s 52.4425 KOps/s $\color{#35bf28}+2.54\%$
test_getitem[list] 0.1542ms 33.6336μs 29.7322 KOps/s 28.9395 KOps/s $\color{#35bf28}+2.74\%$
test_setitem_dim[int] 65.3010μs 26.7173μs 37.4289 KOps/s 35.2625 KOps/s $\textbf{\color{#35bf28}+6.14\%}$
test_setitem_dim[slice_int] 78.5110μs 47.2672μs 21.1563 KOps/s 19.7210 KOps/s $\textbf{\color{#35bf28}+7.28\%}$
test_setitem_dim[range] 0.1026ms 65.8386μs 15.1886 KOps/s 14.5121 KOps/s $\color{#35bf28}+4.66\%$
test_setitem_dim[tuple] 71.3520μs 42.0789μs 23.7649 KOps/s 22.6364 KOps/s $\color{#35bf28}+4.99\%$
test_setitem 44.8310μs 16.0038μs 62.4850 KOps/s 58.1084 KOps/s $\textbf{\color{#35bf28}+7.53\%}$
test_set 54.5010μs 15.2234μs 65.6882 KOps/s 60.5788 KOps/s $\textbf{\color{#35bf28}+8.43\%}$
test_set_shared 1.6930ms 99.7999μs 10.0200 KOps/s 9.8766 KOps/s $\color{#35bf28}+1.45\%$
test_update 62.9710μs 17.9667μs 55.6584 KOps/s 51.8068 KOps/s $\textbf{\color{#35bf28}+7.43\%}$
test_update_nested 62.8110μs 23.4359μs 42.6695 KOps/s 40.1349 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_update__nested 62.2610μs 22.1748μs 45.0961 KOps/s 43.7365 KOps/s $\color{#35bf28}+3.11\%$
test_set_nested 49.1110μs 16.2183μs 61.6588 KOps/s 57.8267 KOps/s $\textbf{\color{#35bf28}+6.63\%}$
test_set_nested_new 71.3020μs 19.1057μs 52.3403 KOps/s 49.4257 KOps/s $\textbf{\color{#35bf28}+5.90\%}$
test_select 68.0020μs 31.6155μs 31.6300 KOps/s 29.8521 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_select_nested 0.7850ms 53.4337μs 18.7148 KOps/s 19.4124 KOps/s $\color{#d91a1a}-3.59\%$
test_exclude_nested 0.1577ms 0.1113ms 8.9881 KOps/s 9.3611 KOps/s $\color{#d91a1a}-3.99\%$
test_empty[True] 0.4157ms 0.3486ms 2.8682 KOps/s 2.9202 KOps/s $\color{#d91a1a}-1.78\%$
test_empty[False] 2.6141μs 0.8355μs 1.1969 MOps/s 1.2449 MOps/s $\color{#d91a1a}-3.86\%$
test_to 91.3010μs 58.9700μs 16.9578 KOps/s 16.5755 KOps/s $\color{#35bf28}+2.31\%$
test_to_nonblocking 87.5420μs 35.4702μs 28.1927 KOps/s 26.7602 KOps/s $\textbf{\color{#35bf28}+5.35\%}$
test_unbind_speed 0.9104ms 0.2555ms 3.9138 KOps/s 3.7691 KOps/s $\color{#35bf28}+3.84\%$
test_unbind_speed_stack0 0.3190ms 0.2574ms 3.8848 KOps/s 3.8064 KOps/s $\color{#35bf28}+2.06\%$
test_unbind_speed_stack1 77.3689ms 0.7834ms 1.2764 KOps/s 1.2655 KOps/s $\color{#35bf28}+0.86\%$
test_split 76.2189ms 1.6348ms 611.6772 Ops/s 577.2157 Ops/s $\textbf{\color{#35bf28}+5.97\%}$
test_chunk 76.2292ms 1.6420ms 609.0240 Ops/s 630.0154 Ops/s $\color{#d91a1a}-3.33\%$
test_creation[device0] 0.1248ms 58.8131μs 17.0030 KOps/s 16.9950 KOps/s $\color{#35bf28}+0.05\%$
test_creation_from_tensor 0.1236ms 54.5333μs 18.3374 KOps/s 18.4876 KOps/s $\color{#d91a1a}-0.81\%$
test_add_one[memmap_tensor0] 97.5120μs 7.0912μs 141.0196 KOps/s 130.0235 KOps/s $\textbf{\color{#35bf28}+8.46\%}$
test_contiguous[memmap_tensor0] 9.4800μs 0.6684μs 1.4960 MOps/s 1.4660 MOps/s $\color{#35bf28}+2.05\%$
test_stack[memmap_tensor0] 34.8010μs 4.7464μs 210.6863 KOps/s 194.1663 KOps/s $\textbf{\color{#35bf28}+8.51\%}$
test_memmaptd_index 1.1570ms 0.2799ms 3.5725 KOps/s 3.4527 KOps/s $\color{#35bf28}+3.47\%$
test_memmaptd_index_astensor 0.6843ms 0.3381ms 2.9577 KOps/s 2.6250 KOps/s $\textbf{\color{#35bf28}+12.68\%}$
test_memmaptd_index_op 1.0566ms 0.6248ms 1.6005 KOps/s 1.4541 KOps/s $\textbf{\color{#35bf28}+10.07\%}$
test_serialize_model 93.0669ms 90.0852ms 11.1006 Ops/s 10.4733 Ops/s $\textbf{\color{#35bf28}+5.99\%}$
test_serialize_model_pickle 1.3670s 1.2384s 0.8075 Ops/s 0.8062 Ops/s $\color{#35bf28}+0.16\%$
test_serialize_weights 93.8568ms 89.1809ms 11.2132 Ops/s 10.6053 Ops/s $\textbf{\color{#35bf28}+5.73\%}$
test_serialize_weights_returnearly 0.2654s 77.1207ms 12.9667 Ops/s 13.1958 Ops/s $\color{#d91a1a}-1.74\%$
test_serialize_weights_pickle 1.3515s 1.2488s 0.8008 Ops/s 0.8090 Ops/s $\color{#d91a1a}-1.02\%$
test_reshape_pytree 57.2410μs 26.1475μs 38.2445 KOps/s 37.1275 KOps/s $\color{#35bf28}+3.01\%$
test_reshape_td 62.3410μs 33.3695μs 29.9675 KOps/s 31.0355 KOps/s $\color{#d91a1a}-3.44\%$
test_view_pytree 0.2124ms 26.0340μs 38.4113 KOps/s 37.9105 KOps/s $\color{#35bf28}+1.32\%$
test_view_td 62.8210μs 36.1330μs 27.6755 KOps/s 26.4218 KOps/s $\color{#35bf28}+4.75\%$
test_unbind_pytree 0.2374ms 32.2071μs 31.0490 KOps/s 30.4506 KOps/s $\color{#35bf28}+1.97\%$
test_unbind_td 0.4481ms 39.7887μs 25.1328 KOps/s 24.7673 KOps/s $\color{#35bf28}+1.48\%$
test_split_pytree 60.1420μs 34.8682μs 28.6795 KOps/s 27.5881 KOps/s $\color{#35bf28}+3.96\%$
test_split_td 0.5010ms 40.6634μs 24.5922 KOps/s 25.0180 KOps/s $\color{#d91a1a}-1.70\%$
test_add_pytree 0.2449ms 38.4132μs 26.0327 KOps/s 24.5716 KOps/s $\textbf{\color{#35bf28}+5.95\%}$
test_add_td 90.8210μs 54.2046μs 18.4486 KOps/s 17.7732 KOps/s $\color{#35bf28}+3.80\%$
test_distributed 0.2280ms 71.0428μs 14.0760 KOps/s 14.4410 KOps/s $\color{#d91a1a}-2.53\%$
test_tdmodule 83.7320μs 15.1024μs 66.2148 KOps/s 64.0319 KOps/s $\color{#35bf28}+3.41\%$
test_tdmodule_dispatch 47.4810μs 28.5548μs 35.0204 KOps/s 32.7936 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_tdseq 31.8110μs 16.4018μs 60.9691 KOps/s 57.7889 KOps/s $\textbf{\color{#35bf28}+5.50\%}$
test_tdseq_dispatch 52.3810μs 31.6241μs 31.6215 KOps/s 30.1897 KOps/s $\color{#35bf28}+4.74\%$
test_instantiation_functorch 1.5532ms 1.4108ms 708.8370 Ops/s 706.5913 Ops/s $\color{#35bf28}+0.32\%$
test_instantiation_td 80.5962ms 1.0808ms 925.2020 Ops/s 925.9221 Ops/s $\color{#d91a1a}-0.08\%$
test_exec_functorch 0.1824ms 0.1483ms 6.7424 KOps/s 6.5813 KOps/s $\color{#35bf28}+2.45\%$
test_exec_functional_call 0.1922ms 0.1398ms 7.1513 KOps/s 6.9485 KOps/s $\color{#35bf28}+2.92\%$
test_exec_td 0.1973ms 0.1360ms 7.3531 KOps/s 7.0525 KOps/s $\color{#35bf28}+4.26\%$
test_exec_td_decorator 0.3205ms 0.2104ms 4.7525 KOps/s 4.7069 KOps/s $\color{#35bf28}+0.97\%$
test_vmap_mlp_speed[True-True] 0.6787ms 0.5740ms 1.7422 KOps/s 1.7021 KOps/s $\color{#35bf28}+2.35\%$
test_vmap_mlp_speed[True-False] 0.6681ms 0.5754ms 1.7380 KOps/s 1.7045 KOps/s $\color{#35bf28}+1.96\%$
test_vmap_mlp_speed[False-True] 0.5621ms 0.5079ms 1.9689 KOps/s 1.9554 KOps/s $\color{#35bf28}+0.69\%$
test_vmap_mlp_speed[False-False] 0.5996ms 0.5069ms 1.9726 KOps/s 1.8713 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_vmap_mlp_speed_decorator[True-True] 1.3753ms 0.6432ms 1.5547 KOps/s 1.2481 KOps/s $\textbf{\color{#35bf28}+24.56\%}$
test_vmap_mlp_speed_decorator[True-False] 0.7683ms 0.6340ms 1.5772 KOps/s 1.5501 KOps/s $\color{#35bf28}+1.75\%$
test_vmap_mlp_speed_decorator[False-True] 0.6829ms 0.5658ms 1.7673 KOps/s 1.7553 KOps/s $\color{#35bf28}+0.69\%$
test_vmap_mlp_speed_decorator[False-False] 0.7020ms 0.5674ms 1.7624 KOps/s 1.7550 KOps/s $\color{#35bf28}+0.43\%$
test_vmap_transformer_speed[True-True] 8.2454ms 7.7152ms 129.6142 Ops/s 129.9454 Ops/s $\color{#d91a1a}-0.25\%$
test_vmap_transformer_speed[True-False] 7.9952ms 7.6593ms 130.5600 Ops/s 130.2056 Ops/s $\color{#35bf28}+0.27\%$
test_vmap_transformer_speed[False-True] 7.7380ms 7.5495ms 132.4589 Ops/s 131.5313 Ops/s $\color{#35bf28}+0.71\%$
test_vmap_transformer_speed[False-False] 7.6524ms 7.5419ms 132.5922 Ops/s 131.5047 Ops/s $\color{#35bf28}+0.83\%$
test_vmap_transformer_speed_decorator[True-True] 19.6062ms 18.5241ms 53.9836 Ops/s 53.5856 Ops/s $\color{#35bf28}+0.74\%$
test_vmap_transformer_speed_decorator[True-False] 18.9552ms 18.5910ms 53.7895 Ops/s 53.7730 Ops/s $\color{#35bf28}+0.03\%$
test_vmap_transformer_speed_decorator[False-True] 19.7911ms 18.5309ms 53.9640 Ops/s 54.0735 Ops/s $\color{#d91a1a}-0.20\%$
test_vmap_transformer_speed_decorator[False-False] 18.6737ms 18.4334ms 54.2494 Ops/s 54.0269 Ops/s $\color{#35bf28}+0.41\%$
test_to_module_speed[True] 2.1611ms 1.5077ms 663.2655 Ops/s 668.8355 Ops/s $\color{#d91a1a}-0.83\%$
test_to_module_speed[False] 1.5749ms 1.4662ms 682.0562 Ops/s 673.3323 Ops/s $\color{#35bf28}+1.30\%$
test_tc_init 90.1920μs 50.4082μs 19.8380 KOps/s 18.7394 KOps/s $\textbf{\color{#35bf28}+5.86\%}$
test_tc_init_nested 0.1590ms 99.5660μs 10.0436 KOps/s 9.8759 KOps/s $\color{#35bf28}+1.70\%$
test_tc_first_layer_tensor 18.1300μs 3.8648μs 258.7436 KOps/s 261.6362 KOps/s $\color{#d91a1a}-1.11\%$
test_tc_first_layer_nontensor 19.3200μs 3.9135μs 255.5275 KOps/s 270.3918 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_tc_second_layer_tensor 5.5050μs 1.2480μs 801.2590 KOps/s 778.7677 KOps/s $\color{#35bf28}+2.89\%$
test_tc_second_layer_nontensor 21.4710μs 4.4635μs 224.0371 KOps/s 230.6451 KOps/s $\color{#d91a1a}-2.87\%$
test_unbind 0.1094s 13.7768ms 72.5858 Ops/s 74.5284 Ops/s $\color{#d91a1a}-2.61\%$
test_full_like 13.9661ms 13.6029ms 73.5138 Ops/s 105.2127 Ops/s $\textbf{\color{#d91a1a}-30.13\%}$
test_zeros_like 8.2825ms 8.0008ms 124.9878 Ops/s 123.7499 Ops/s $\color{#35bf28}+1.00\%$
test_ones_like 8.1518ms 7.9803ms 125.3089 Ops/s 124.4342 Ops/s $\color{#35bf28}+0.70\%$
test_clone 9.7665ms 9.6049ms 104.1132 Ops/s 102.6436 Ops/s $\color{#35bf28}+1.43\%$
test_squeeze 62.1710μs 10.7217μs 93.2688 KOps/s 93.7287 KOps/s $\color{#d91a1a}-0.49\%$
test_unsqueeze 0.1603ms 86.4200μs 11.5714 KOps/s 11.5332 KOps/s $\color{#35bf28}+0.33\%$
test_split 3.5038ms 3.1830ms 314.1713 Ops/s 318.4357 Ops/s $\color{#d91a1a}-1.34\%$
test_permute 0.2993ms 0.2031ms 4.9237 KOps/s 4.8584 KOps/s $\color{#35bf28}+1.34\%$
test_stack 27.9183ms 27.5362ms 36.3158 Ops/s 35.8208 Ops/s $\color{#35bf28}+1.38\%$
test_cat 27.3764ms 27.2497ms 36.6977 Ops/s 35.9471 Ops/s $\color{#35bf28}+2.09\%$

Copy link

github-actions bot commented Jul 2, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}25$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 58.9000μs 17.1057μs 58.4602 KOps/s 63.7008 KOps/s $\textbf{\color{#d91a1a}-8.23\%}$
test_plain_set_stack_nested 42.3190μs 17.2721μs 57.8969 KOps/s 63.3677 KOps/s $\textbf{\color{#d91a1a}-8.63\%}$
test_plain_set_nested_inplace 65.8840μs 19.6232μs 50.9602 KOps/s 55.7525 KOps/s $\textbf{\color{#d91a1a}-8.60\%}$
test_plain_set_stack_nested_inplace 59.3010μs 19.4781μs 51.3396 KOps/s 55.7285 KOps/s $\textbf{\color{#d91a1a}-7.88\%}$
test_items 43.9020μs 2.6470μs 377.7803 KOps/s 390.3512 KOps/s $\color{#d91a1a}-3.22\%$
test_items_nested 0.5119ms 0.2749ms 3.6373 KOps/s 3.6032 KOps/s $\color{#35bf28}+0.95\%$
test_items_nested_locked 0.4711ms 0.2754ms 3.6307 KOps/s 3.5192 KOps/s $\color{#35bf28}+3.17\%$
test_items_nested_leaf 0.1624ms 79.9692μs 12.5048 KOps/s 12.5520 KOps/s $\color{#d91a1a}-0.38\%$
test_items_stack_nested 0.3789ms 0.2756ms 3.6290 KOps/s 3.5227 KOps/s $\color{#35bf28}+3.02\%$
test_items_stack_nested_leaf 0.1542ms 78.2781μs 12.7750 KOps/s 12.3324 KOps/s $\color{#35bf28}+3.59\%$
test_items_stack_nested_locked 0.5139ms 0.2746ms 3.6413 KOps/s 3.5433 KOps/s $\color{#35bf28}+2.77\%$
test_keys 25.3370μs 3.8345μs 260.7872 KOps/s 248.4669 KOps/s $\color{#35bf28}+4.96\%$
test_keys_nested 0.3004ms 0.1407ms 7.1057 KOps/s 7.2455 KOps/s $\color{#d91a1a}-1.93\%$
test_keys_nested_locked 0.7808ms 0.1458ms 6.8606 KOps/s 6.9597 KOps/s $\color{#d91a1a}-1.42\%$
test_keys_nested_leaf 0.3089ms 0.1211ms 8.2577 KOps/s 8.5205 KOps/s $\color{#d91a1a}-3.08\%$
test_keys_stack_nested 0.2735ms 0.1370ms 7.2975 KOps/s 7.1705 KOps/s $\color{#35bf28}+1.77\%$
test_keys_stack_nested_leaf 0.2340ms 0.1164ms 8.5893 KOps/s 8.6371 KOps/s $\color{#d91a1a}-0.55\%$
test_keys_stack_nested_locked 0.2638ms 0.1414ms 7.0702 KOps/s 6.9973 KOps/s $\color{#35bf28}+1.04\%$
test_values 12.5185μs 1.1494μs 870.0091 KOps/s 855.3076 KOps/s $\color{#35bf28}+1.72\%$
test_values_nested 0.1039ms 50.9384μs 19.6316 KOps/s 19.4091 KOps/s $\color{#35bf28}+1.15\%$
test_values_nested_locked 0.1044ms 51.2236μs 19.5223 KOps/s 19.3444 KOps/s $\color{#35bf28}+0.92\%$
test_values_nested_leaf 93.3450μs 46.2199μs 21.6357 KOps/s 21.5739 KOps/s $\color{#35bf28}+0.29\%$
test_values_stack_nested 0.1515ms 51.6446μs 19.3631 KOps/s 19.3680 KOps/s $\color{#d91a1a}-0.03\%$
test_values_stack_nested_leaf 93.9460μs 45.4326μs 22.0106 KOps/s 21.6113 KOps/s $\color{#35bf28}+1.85\%$
test_values_stack_nested_locked 0.1319ms 51.9403μs 19.2529 KOps/s 19.4385 KOps/s $\color{#d91a1a}-0.95\%$
test_membership 51.4570μs 1.3857μs 721.6734 KOps/s 741.3625 KOps/s $\color{#d91a1a}-2.66\%$
test_membership_nested 44.6330μs 3.4380μs 290.8688 KOps/s 286.3512 KOps/s $\color{#35bf28}+1.58\%$
test_membership_nested_leaf 46.1070μs 3.4248μs 291.9857 KOps/s 286.7711 KOps/s $\color{#35bf28}+1.82\%$
test_membership_stacked_nested 35.3670μs 3.4536μs 289.5520 KOps/s 290.9552 KOps/s $\color{#d91a1a}-0.48\%$
test_membership_stacked_nested_leaf 34.5550μs 3.4367μs 290.9773 KOps/s 287.6806 KOps/s $\color{#35bf28}+1.15\%$
test_membership_nested_last 52.4280μs 4.3133μs 231.8413 KOps/s 234.3451 KOps/s $\color{#d91a1a}-1.07\%$
test_membership_nested_leaf_last 36.7800μs 4.2433μs 235.6636 KOps/s 233.8542 KOps/s $\color{#35bf28}+0.77\%$
test_membership_stacked_nested_last 32.7010μs 13.5675μs 73.7057 KOps/s 236.9840 KOps/s $\textbf{\color{#d91a1a}-68.90\%}$
test_membership_stacked_nested_leaf_last 36.0580μs 13.5298μs 73.9109 KOps/s 233.7861 KOps/s $\textbf{\color{#d91a1a}-68.39\%}$
test_nested_getleaf 49.3730μs 10.9402μs 91.4059 KOps/s 94.6463 KOps/s $\color{#d91a1a}-3.42\%$
test_nested_get 44.1540μs 10.3342μs 96.7664 KOps/s 99.6234 KOps/s $\color{#d91a1a}-2.87\%$
test_stacked_getleaf 51.6370μs 10.7039μs 93.4237 KOps/s 94.2719 KOps/s $\color{#d91a1a}-0.90\%$
test_stacked_get 37.2700μs 10.0764μs 99.2416 KOps/s 100.1364 KOps/s $\color{#d91a1a}-0.89\%$
test_nested_getitemleaf 53.8010μs 11.4173μs 87.5862 KOps/s 88.4572 KOps/s $\color{#d91a1a}-0.98\%$
test_nested_getitem 49.8840μs 10.5097μs 95.1506 KOps/s 95.8483 KOps/s $\color{#d91a1a}-0.73\%$
test_stacked_getitemleaf 36.8090μs 11.1374μs 89.7879 KOps/s 89.2341 KOps/s $\color{#35bf28}+0.62\%$
test_stacked_getitem 42.4590μs 10.3654μs 96.4746 KOps/s 96.1323 KOps/s $\color{#35bf28}+0.36\%$
test_lock_nested 0.9839ms 0.3430ms 2.9153 KOps/s 2.8147 KOps/s $\color{#35bf28}+3.57\%$
test_lock_stack_nested 0.5683ms 0.2968ms 3.3697 KOps/s 3.1561 KOps/s $\textbf{\color{#35bf28}+6.77\%}$
test_unlock_nested 0.8595ms 0.3542ms 2.8229 KOps/s 2.7743 KOps/s $\color{#35bf28}+1.75\%$
test_unlock_stack_nested 0.4546ms 0.3064ms 3.2642 KOps/s 3.1444 KOps/s $\color{#35bf28}+3.81\%$
test_flatten_speed 0.6043ms 98.8052μs 10.1209 KOps/s 10.1087 KOps/s $\color{#35bf28}+0.12\%$
test_unflatten_speed 0.7208ms 0.4144ms 2.4132 KOps/s 2.4077 KOps/s $\color{#35bf28}+0.23\%$
test_common_ops 3.7448ms 0.7221ms 1.3848 KOps/s 1.4417 KOps/s $\color{#d91a1a}-3.95\%$
test_creation 19.2570μs 1.9473μs 513.5198 KOps/s 518.7527 KOps/s $\color{#d91a1a}-1.01\%$
test_creation_empty 42.6400μs 10.7213μs 93.2725 KOps/s 123.3467 KOps/s $\textbf{\color{#d91a1a}-24.38\%}$
test_creation_nested_1 61.6460μs 13.2966μs 75.2073 KOps/s 93.8562 KOps/s $\textbf{\color{#d91a1a}-19.87\%}$
test_creation_nested_2 66.1550μs 16.6806μs 59.9499 KOps/s 71.0710 KOps/s $\textbf{\color{#d91a1a}-15.65\%}$
test_clone 1.2750ms 13.1228μs 76.2035 KOps/s 74.9810 KOps/s $\color{#35bf28}+1.63\%$
test_getitem[int] 65.3830μs 11.5989μs 86.2154 KOps/s 86.4798 KOps/s $\color{#d91a1a}-0.31\%$
test_getitem[slice_int] 70.9440μs 22.2206μs 45.0032 KOps/s 43.1796 KOps/s $\color{#35bf28}+4.22\%$
test_getitem[range] 83.8380μs 60.6558μs 16.4865 KOps/s 16.8829 KOps/s $\color{#d91a1a}-2.35\%$
test_getitem[tuple] 73.5080μs 18.6124μs 53.7275 KOps/s 53.8367 KOps/s $\color{#d91a1a}-0.20\%$
test_getitem[list] 0.1726ms 41.6085μs 24.0335 KOps/s 24.3868 KOps/s $\color{#d91a1a}-1.45\%$
test_setitem_dim[int] 72.4960μs 34.0027μs 29.4094 KOps/s 32.8092 KOps/s $\textbf{\color{#d91a1a}-10.36\%}$
test_setitem_dim[slice_int] 94.0070μs 61.5389μs 16.2499 KOps/s 17.0330 KOps/s $\color{#d91a1a}-4.60\%$
test_setitem_dim[range] 0.1498ms 85.8370μs 11.6500 KOps/s 12.2866 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_setitem_dim[tuple] 0.1148ms 51.2995μs 19.4934 KOps/s 21.6858 KOps/s $\textbf{\color{#d91a1a}-10.11\%}$
test_setitem 65.8740μs 19.5713μs 51.0952 KOps/s 52.9776 KOps/s $\color{#d91a1a}-3.55\%$
test_set 65.8530μs 19.0443μs 52.5091 KOps/s 55.9312 KOps/s $\textbf{\color{#d91a1a}-6.12\%}$
test_set_shared 76.8877ms 0.1693ms 5.9082 KOps/s 6.7713 KOps/s $\textbf{\color{#d91a1a}-12.75\%}$
test_update 0.1691ms 22.4156μs 44.6118 KOps/s 52.1519 KOps/s $\textbf{\color{#d91a1a}-14.46\%}$
test_update_nested 0.1161ms 30.5285μs 32.7562 KOps/s 36.1973 KOps/s $\textbf{\color{#d91a1a}-9.51\%}$
test_update__nested 58.4800μs 24.6076μs 40.6378 KOps/s 39.6609 KOps/s $\color{#35bf28}+2.46\%$
test_set_nested 88.5560μs 21.0675μs 47.4664 KOps/s 50.6438 KOps/s $\textbf{\color{#d91a1a}-6.27\%}$
test_set_nested_new 84.8590μs 25.0078μs 39.9876 KOps/s 41.5468 KOps/s $\color{#d91a1a}-3.75\%$
test_select 0.1180ms 40.3079μs 24.8090 KOps/s 25.1883 KOps/s $\color{#d91a1a}-1.51\%$
test_select_nested 0.1382ms 58.6807μs 17.0414 KOps/s 17.4770 KOps/s $\color{#d91a1a}-2.49\%$
test_exclude_nested 0.2280ms 0.1211ms 8.2573 KOps/s 8.4930 KOps/s $\color{#d91a1a}-2.77\%$
test_empty[True] 0.5151ms 0.3992ms 2.5051 KOps/s 2.5413 KOps/s $\color{#d91a1a}-1.42\%$
test_empty[False] 6.7226μs 1.0631μs 940.6130 KOps/s 915.4568 KOps/s $\color{#35bf28}+2.75\%$
test_unbind_speed 1.7892ms 0.2558ms 3.9091 KOps/s 4.0649 KOps/s $\color{#d91a1a}-3.83\%$
test_unbind_speed_stack0 0.3262ms 0.2426ms 4.1223 KOps/s 4.0950 KOps/s $\color{#35bf28}+0.67\%$
test_unbind_speed_stack1 80.1716ms 0.7213ms 1.3864 KOps/s 1.3522 KOps/s $\color{#35bf28}+2.53\%$
test_split 78.6365ms 1.6275ms 614.4520 Ops/s 611.8575 Ops/s $\color{#35bf28}+0.42\%$
test_chunk 79.8825ms 1.6682ms 599.4636 Ops/s 591.5005 Ops/s $\color{#35bf28}+1.35\%$
test_creation[device0] 3.9867ms 86.9954μs 11.4949 KOps/s 11.6291 KOps/s $\color{#d91a1a}-1.15\%$
test_creation_from_tensor 0.4218ms 86.5545μs 11.5534 KOps/s 11.4282 KOps/s $\color{#35bf28}+1.10\%$
test_add_one[memmap_tensor0] 0.1142ms 5.2540μs 190.3308 KOps/s 186.8112 KOps/s $\color{#35bf28}+1.88\%$
test_contiguous[memmap_tensor0] 14.0960μs 0.6360μs 1.5724 MOps/s 1.5830 MOps/s $\color{#d91a1a}-0.67\%$
test_stack[memmap_tensor0] 29.3250μs 3.5296μs 283.3182 KOps/s 280.4247 KOps/s $\color{#35bf28}+1.03\%$
test_memmaptd_index 1.3858ms 0.2600ms 3.8468 KOps/s 3.9232 KOps/s $\color{#d91a1a}-1.95\%$
test_memmaptd_index_astensor 0.7701ms 0.3310ms 3.0211 KOps/s 3.0065 KOps/s $\color{#35bf28}+0.48\%$
test_memmaptd_index_op 1.2096ms 0.6180ms 1.6181 KOps/s 1.7360 KOps/s $\textbf{\color{#d91a1a}-6.79\%}$
test_serialize_model 0.1815s 0.1086s 9.2047 Ops/s 8.6872 Ops/s $\textbf{\color{#35bf28}+5.96\%}$
test_serialize_model_pickle 0.4510s 0.3825s 2.6143 Ops/s 2.6671 Ops/s $\color{#d91a1a}-1.98\%$
test_serialize_weights 0.1053s 0.1007s 9.9332 Ops/s 10.0519 Ops/s $\color{#d91a1a}-1.18\%$
test_serialize_weights_returnearly 0.1997s 0.1304s 7.6664 Ops/s 8.0284 Ops/s $\color{#d91a1a}-4.51\%$
test_serialize_weights_pickle 0.8967s 0.6095s 1.6407 Ops/s 2.4708 Ops/s $\textbf{\color{#d91a1a}-33.60\%}$
test_serialize_weights_filesystem 98.2693ms 93.9948ms 10.6389 Ops/s 10.0972 Ops/s $\textbf{\color{#35bf28}+5.37\%}$
test_serialize_model_filesystem 0.1033s 95.2308ms 10.5008 Ops/s 9.1142 Ops/s $\textbf{\color{#35bf28}+15.21\%}$
test_reshape_pytree 62.4770μs 25.8629μs 38.6654 KOps/s 38.7562 KOps/s $\color{#d91a1a}-0.23\%$
test_reshape_td 0.1022ms 33.6953μs 29.6777 KOps/s 29.5964 KOps/s $\color{#35bf28}+0.27\%$
test_view_pytree 66.9160μs 25.7784μs 38.7921 KOps/s 39.2944 KOps/s $\color{#d91a1a}-1.28\%$
test_view_td 87.5840μs 38.7494μs 25.8069 KOps/s 25.5258 KOps/s $\color{#35bf28}+1.10\%$
test_unbind_pytree 60.9850μs 29.6387μs 33.7397 KOps/s 34.2231 KOps/s $\color{#d91a1a}-1.41\%$
test_unbind_td 0.4460ms 36.5309μs 27.3741 KOps/s 27.7361 KOps/s $\color{#d91a1a}-1.31\%$
test_split_pytree 81.1020μs 30.0356μs 33.2939 KOps/s 33.6495 KOps/s $\color{#d91a1a}-1.06\%$
test_split_td 0.1259ms 39.6551μs 25.2174 KOps/s 25.2330 KOps/s $\color{#d91a1a}-0.06\%$
test_add_pytree 73.7380μs 35.1559μs 28.4447 KOps/s 28.7098 KOps/s $\color{#d91a1a}-0.92\%$
test_add_td 0.1254ms 54.7245μs 18.2734 KOps/s 20.1343 KOps/s $\textbf{\color{#d91a1a}-9.24\%}$
test_distributed 0.2143ms 0.1033ms 9.6764 KOps/s 9.5254 KOps/s $\color{#35bf28}+1.59\%$
test_tdmodule 87.6640μs 18.1105μs 55.2166 KOps/s 58.7234 KOps/s $\textbf{\color{#d91a1a}-5.97\%}$
test_tdmodule_dispatch 64.6110μs 35.8488μs 27.8950 KOps/s 30.1762 KOps/s $\textbf{\color{#d91a1a}-7.56\%}$
test_tdseq 36.0470μs 21.3289μs 46.8847 KOps/s 51.4446 KOps/s $\textbf{\color{#d91a1a}-8.86\%}$
test_tdseq_dispatch 84.7890μs 42.7518μs 23.3908 KOps/s 26.4961 KOps/s $\textbf{\color{#d91a1a}-11.72\%}$
test_instantiation_functorch 1.8233ms 1.3153ms 760.2891 Ops/s 757.9602 Ops/s $\color{#35bf28}+0.31\%$
test_instantiation_td 2.0263ms 1.0211ms 979.3751 Ops/s 985.2984 Ops/s $\color{#d91a1a}-0.60\%$
test_exec_functorch 0.2928ms 0.1628ms 6.1412 KOps/s 6.0225 KOps/s $\color{#35bf28}+1.97\%$
test_exec_functional_call 0.8753ms 0.1582ms 6.3230 KOps/s 6.6014 KOps/s $\color{#d91a1a}-4.22\%$
test_exec_td 0.2794ms 0.1486ms 6.7308 KOps/s 6.6925 KOps/s $\color{#35bf28}+0.57\%$
test_exec_td_decorator 0.9285ms 0.2263ms 4.4191 KOps/s 4.3949 KOps/s $\color{#35bf28}+0.55\%$
test_vmap_mlp_speed[True-True] 0.6446ms 0.4839ms 2.0664 KOps/s 2.0336 KOps/s $\color{#35bf28}+1.61\%$
test_vmap_mlp_speed[True-False] 0.6228ms 0.4824ms 2.0728 KOps/s 2.0686 KOps/s $\color{#35bf28}+0.20\%$
test_vmap_mlp_speed[False-True] 0.6401ms 0.3936ms 2.5405 KOps/s 2.4980 KOps/s $\color{#35bf28}+1.70\%$
test_vmap_mlp_speed[False-False] 0.5367ms 0.3926ms 2.5474 KOps/s 2.5189 KOps/s $\color{#35bf28}+1.13\%$
test_vmap_mlp_speed_decorator[True-True] 1.1908ms 0.5644ms 1.7719 KOps/s 1.7933 KOps/s $\color{#d91a1a}-1.19\%$
test_vmap_mlp_speed_decorator[True-False] 0.8452ms 0.5569ms 1.7956 KOps/s 1.8118 KOps/s $\color{#d91a1a}-0.89\%$
test_vmap_mlp_speed_decorator[False-True] 0.8087ms 0.4593ms 2.1771 KOps/s 2.1819 KOps/s $\color{#d91a1a}-0.22\%$
test_vmap_mlp_speed_decorator[False-False] 0.8107ms 0.4602ms 2.1730 KOps/s 2.1446 KOps/s $\color{#35bf28}+1.33\%$
test_to_module_speed[True] 3.4995ms 1.7264ms 579.2266 Ops/s 587.6549 Ops/s $\color{#d91a1a}-1.43\%$
test_to_module_speed[False] 1.9225ms 1.7021ms 587.5222 Ops/s 597.5226 Ops/s $\color{#d91a1a}-1.67\%$
test_tc_init 0.1140ms 54.5429μs 18.3342 KOps/s 19.2815 KOps/s $\color{#d91a1a}-4.91\%$
test_tc_init_nested 0.2116ms 0.1115ms 8.9664 KOps/s 10.0286 KOps/s $\textbf{\color{#d91a1a}-10.59\%}$
test_tc_first_layer_tensor 34.5850μs 8.5122μs 117.4789 KOps/s 120.2152 KOps/s $\color{#d91a1a}-2.28\%$
test_tc_first_layer_nontensor 35.8280μs 8.5780μs 116.5771 KOps/s 121.4015 KOps/s $\color{#d91a1a}-3.97\%$
test_tc_second_layer_tensor 29.7050μs 2.5887μs 386.2896 KOps/s 397.7623 KOps/s $\color{#d91a1a}-2.88\%$
test_tc_second_layer_nontensor 41.6580μs 9.5620μs 104.5809 KOps/s 107.1160 KOps/s $\color{#d91a1a}-2.37\%$
test_unbind 89.3256ms 14.9400ms 66.9342 Ops/s 61.0002 Ops/s $\textbf{\color{#35bf28}+9.73\%}$
test_full_like 15.9554ms 12.6801ms 78.8640 Ops/s 70.4421 Ops/s $\textbf{\color{#35bf28}+11.96\%}$
test_zeros_like 13.0824ms 6.6688ms 149.9524 Ops/s 149.5659 Ops/s $\color{#35bf28}+0.26\%$
test_ones_like 13.0072ms 7.1384ms 140.0881 Ops/s 137.2055 Ops/s $\color{#35bf28}+2.10\%$
test_clone 12.6697ms 9.3429ms 107.0330 Ops/s 102.9261 Ops/s $\color{#35bf28}+3.99\%$
test_squeeze 76.5230μs 12.8636μs 77.7385 KOps/s 79.4915 KOps/s $\color{#d91a1a}-2.21\%$
test_unsqueeze 0.1934ms 99.5863μs 10.0415 KOps/s 10.1390 KOps/s $\color{#d91a1a}-0.96\%$
test_split 0.5608ms 0.2771ms 3.6083 KOps/s 3.4945 KOps/s $\color{#35bf28}+3.26\%$
test_permute 0.4537ms 0.2236ms 4.4719 KOps/s 4.4105 KOps/s $\color{#35bf28}+1.39\%$
test_stack 35.5322ms 24.5878ms 40.6706 Ops/s 37.7817 Ops/s $\textbf{\color{#35bf28}+7.65\%}$
test_cat 26.4019ms 23.9361ms 41.7779 Ops/s 39.4699 Ops/s $\textbf{\color{#35bf28}+5.85\%}$

@vmoens vmoens added the enhancement New feature or request label Jul 3, 2024
@vmoens vmoens merged commit 6165e77 into main Jul 3, 2024
36 of 43 checks passed
@vmoens vmoens deleted the iterable-map branch July 3, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants