Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster lock_/unclock_ when sub-tds are already locked #816

Merged
merged 1 commit into from
Jun 14, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 14, 2024

Benchmark:

from tensordict import TensorDict
import time
from matplotlib import pyplot as plt

def makelockunlock(n, lock_only=True):
    prev_td = None
    for i in range(n):
        td = TensorDict()
        if prev_td is not None:
            td["nested"] = prev_td.lock_()
        prev_td = td
    if lock_only:
        t0 = time.time()
        td.lock_()
        return time.time()-t0
    else:
        t0 = time.time()
        td.lock_().unlock_()
        return time.time()-t0

ts = [min(makelockunlock(i), makelockunlock(i)) for i in range(2, 200)]
plt.plot(ts)

ts = [min(makelockunlock(i, False), makelockunlock(i, False)) for i in range(2, 200)]
plt.plot(ts)

Before:
image
image

After:
image
image

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 14, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}24$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 35.3160μs 17.4330μs 57.3623 KOps/s 61.6198 KOps/s $\textbf{\color{#d91a1a}-6.91\%}$
test_plain_set_stack_nested 37.2200μs 17.5799μs 56.8832 KOps/s 60.8337 KOps/s $\textbf{\color{#d91a1a}-6.49\%}$
test_plain_set_nested_inplace 54.6220μs 19.8358μs 50.4139 KOps/s 54.4270 KOps/s $\textbf{\color{#d91a1a}-7.37\%}$
test_plain_set_stack_nested_inplace 55.7140μs 19.9636μs 50.0912 KOps/s 54.3579 KOps/s $\textbf{\color{#d91a1a}-7.85\%}$
test_items 27.7520μs 2.5137μs 397.8251 KOps/s 395.4885 KOps/s $\color{#35bf28}+0.59\%$
test_items_nested 0.3396ms 0.2706ms 3.6949 KOps/s 3.7726 KOps/s $\color{#d91a1a}-2.06\%$
test_items_nested_locked 0.4210ms 0.2719ms 3.6783 KOps/s 3.7643 KOps/s $\color{#d91a1a}-2.28\%$
test_items_nested_leaf 0.1503ms 77.5085μs 12.9018 KOps/s 12.9891 KOps/s $\color{#d91a1a}-0.67\%$
test_items_stack_nested 0.4685ms 0.2738ms 3.6518 KOps/s 3.7836 KOps/s $\color{#d91a1a}-3.49\%$
test_items_stack_nested_leaf 0.1286ms 75.3538μs 13.2707 KOps/s 13.3503 KOps/s $\color{#d91a1a}-0.60\%$
test_items_stack_nested_locked 0.4712ms 0.2713ms 3.6863 KOps/s 3.7803 KOps/s $\color{#d91a1a}-2.49\%$
test_keys 18.3650μs 3.8874μs 257.2419 KOps/s 261.5125 KOps/s $\color{#d91a1a}-1.63\%$
test_keys_nested 0.2251ms 0.1379ms 7.2493 KOps/s 7.2499 KOps/s $-0.01\%$
test_keys_nested_locked 0.7349ms 0.1423ms 7.0252 KOps/s 7.0793 KOps/s $\color{#d91a1a}-0.76\%$
test_keys_nested_leaf 0.2067ms 0.1161ms 8.6142 KOps/s 8.5689 KOps/s $\color{#35bf28}+0.53\%$
test_keys_stack_nested 0.2194ms 0.1343ms 7.4468 KOps/s 7.4386 KOps/s $\color{#35bf28}+0.11\%$
test_keys_stack_nested_leaf 0.1996ms 0.1147ms 8.7173 KOps/s 8.7319 KOps/s $\color{#d91a1a}-0.17\%$
test_keys_stack_nested_locked 0.2378ms 0.1390ms 7.1963 KOps/s 7.1654 KOps/s $\color{#35bf28}+0.43\%$
test_values 5.3260μs 1.2566μs 795.8101 KOps/s 850.5704 KOps/s $\textbf{\color{#d91a1a}-6.44\%}$
test_values_nested 0.1006ms 50.5153μs 19.7960 KOps/s 19.6023 KOps/s $\color{#35bf28}+0.99\%$
test_values_nested_locked 92.1120μs 50.6023μs 19.7619 KOps/s 19.4821 KOps/s $\color{#35bf28}+1.44\%$
test_values_nested_leaf 98.4540μs 46.0780μs 21.7023 KOps/s 21.6415 KOps/s $\color{#35bf28}+0.28\%$
test_values_stack_nested 93.0140μs 51.7862μs 19.3102 KOps/s 19.1361 KOps/s $\color{#35bf28}+0.91\%$
test_values_stack_nested_leaf 86.4120μs 45.2400μs 22.1043 KOps/s 22.1051 KOps/s $-0.00\%$
test_values_stack_nested_locked 96.0400μs 51.5428μs 19.4014 KOps/s 19.2844 KOps/s $\color{#35bf28}+0.61\%$
test_membership 23.2940μs 1.3223μs 756.2375 KOps/s 738.4065 KOps/s $\color{#35bf28}+2.41\%$
test_membership_nested 19.6570μs 3.3602μs 297.5987 KOps/s 293.0335 KOps/s $\color{#35bf28}+1.56\%$
test_membership_nested_leaf 21.0100μs 3.3734μs 296.4370 KOps/s 292.2870 KOps/s $\color{#35bf28}+1.42\%$
test_membership_stacked_nested 60.7030μs 3.3780μs 296.0351 KOps/s 277.0504 KOps/s $\textbf{\color{#35bf28}+6.85\%}$
test_membership_stacked_nested_leaf 57.6280μs 3.3752μs 296.2819 KOps/s 293.2297 KOps/s $\color{#35bf28}+1.04\%$
test_membership_nested_last 0.1334ms 4.2683μs 234.2850 KOps/s 238.1806 KOps/s $\color{#d91a1a}-1.64\%$
test_membership_nested_leaf_last 27.7720μs 4.1608μs 240.3378 KOps/s 237.9310 KOps/s $\color{#35bf28}+1.01\%$
test_membership_stacked_nested_last 52.8890μs 13.3188μs 75.0816 KOps/s 75.4707 KOps/s $\color{#d91a1a}-0.52\%$
test_membership_stacked_nested_leaf_last 34.9450μs 13.3325μs 75.0049 KOps/s 74.9834 KOps/s $\color{#35bf28}+0.03\%$
test_nested_getleaf 33.6730μs 10.6599μs 93.8099 KOps/s 93.2884 KOps/s $\color{#35bf28}+0.56\%$
test_nested_get 55.4640μs 10.0283μs 99.7176 KOps/s 98.5474 KOps/s $\color{#35bf28}+1.19\%$
test_stacked_getleaf 28.7940μs 10.6528μs 93.8724 KOps/s 94.1523 KOps/s $\color{#d91a1a}-0.30\%$
test_stacked_get 63.0390μs 9.9272μs 100.7329 KOps/s 99.4292 KOps/s $\color{#35bf28}+1.31\%$
test_nested_getitemleaf 58.8200μs 11.2367μs 88.9943 KOps/s 86.4533 KOps/s $\color{#35bf28}+2.94\%$
test_nested_getitem 30.1370μs 10.4422μs 95.7653 KOps/s 96.3435 KOps/s $\color{#d91a1a}-0.60\%$
test_stacked_getitemleaf 58.9900μs 11.3336μs 88.2333 KOps/s 89.6939 KOps/s $\color{#d91a1a}-1.63\%$
test_stacked_getitem 33.4830μs 10.3090μs 97.0025 KOps/s 97.0327 KOps/s $\color{#d91a1a}-0.03\%$
test_lock_nested 52.2525ms 0.3924ms 2.5486 KOps/s 2.9604 KOps/s $\textbf{\color{#d91a1a}-13.91\%}$
test_lock_stack_nested 0.4424ms 0.2966ms 3.3715 KOps/s 3.3495 KOps/s $\color{#35bf28}+0.66\%$
test_unlock_nested 0.7324ms 0.3460ms 2.8904 KOps/s 2.8835 KOps/s $\color{#35bf28}+0.24\%$
test_unlock_stack_nested 0.5460ms 0.3058ms 3.2705 KOps/s 3.2947 KOps/s $\color{#d91a1a}-0.74\%$
test_flatten_speed 0.2028ms 96.0730μs 10.4088 KOps/s 10.5659 KOps/s $\color{#d91a1a}-1.49\%$
test_unflatten_speed 0.6260ms 0.4102ms 2.4380 KOps/s 2.4509 KOps/s $\color{#d91a1a}-0.52\%$
test_common_ops 3.0303ms 0.7249ms 1.3795 KOps/s 1.4787 KOps/s $\textbf{\color{#d91a1a}-6.71\%}$
test_creation 74.7600μs 1.9589μs 510.4936 KOps/s 523.2389 KOps/s $\color{#d91a1a}-2.44\%$
test_creation_empty 47.4890μs 10.8284μs 92.3497 KOps/s 107.0183 KOps/s $\textbf{\color{#d91a1a}-13.71\%}$
test_creation_nested_1 45.5850μs 13.6531μs 73.2436 KOps/s 82.2006 KOps/s $\textbf{\color{#d91a1a}-10.90\%}$
test_creation_nested_2 60.0520μs 16.8020μs 59.5168 KOps/s 64.2481 KOps/s $\textbf{\color{#d91a1a}-7.36\%}$
test_clone 75.1810μs 13.5040μs 74.0523 KOps/s 74.7536 KOps/s $\color{#d91a1a}-0.94\%$
test_getitem[int] 50.6460μs 11.6746μs 85.6562 KOps/s 86.5093 KOps/s $\color{#d91a1a}-0.99\%$
test_getitem[slice_int] 54.1410μs 22.3201μs 44.8027 KOps/s 43.9481 KOps/s $\color{#35bf28}+1.94\%$
test_getitem[range] 96.1500μs 60.7378μs 16.4642 KOps/s 17.4516 KOps/s $\textbf{\color{#d91a1a}-5.66\%}$
test_getitem[tuple] 58.5200μs 18.9484μs 52.7749 KOps/s 52.4264 KOps/s $\color{#35bf28}+0.66\%$
test_getitem[list] 92.9140μs 40.4336μs 24.7319 KOps/s 24.7398 KOps/s $\color{#d91a1a}-0.03\%$
test_setitem_dim[int] 56.1450μs 33.5091μs 29.8427 KOps/s 30.7110 KOps/s $\color{#d91a1a}-2.83\%$
test_setitem_dim[slice_int] 90.7000μs 59.8631μs 16.7048 KOps/s 16.9218 KOps/s $\color{#d91a1a}-1.28\%$
test_setitem_dim[range] 0.1621ms 81.8928μs 12.2111 KOps/s 12.4585 KOps/s $\color{#d91a1a}-1.99\%$
test_setitem_dim[tuple] 95.4390μs 49.1396μs 20.3502 KOps/s 21.2055 KOps/s $\color{#d91a1a}-4.03\%$
test_setitem 65.5220μs 20.5454μs 48.6727 KOps/s 52.0733 KOps/s $\textbf{\color{#d91a1a}-6.53\%}$
test_set 50.4740μs 20.1996μs 49.5060 KOps/s 53.1369 KOps/s $\textbf{\color{#d91a1a}-6.83\%}$
test_set_shared 1.6814ms 0.1436ms 6.9657 KOps/s 7.1733 KOps/s $\color{#d91a1a}-2.89\%$
test_update 0.1504ms 22.4034μs 44.6361 KOps/s 48.7359 KOps/s $\textbf{\color{#d91a1a}-8.41\%}$
test_update_nested 85.7380μs 31.1997μs 32.0516 KOps/s 35.0010 KOps/s $\textbf{\color{#d91a1a}-8.43\%}$
test_update__nested 88.2060μs 25.1750μs 39.7219 KOps/s 39.6393 KOps/s $\color{#35bf28}+0.21\%$
test_set_nested 0.1058ms 21.9124μs 45.6363 KOps/s 48.4150 KOps/s $\textbf{\color{#d91a1a}-5.74\%}$
test_set_nested_new 72.0050μs 26.2882μs 38.0398 KOps/s 40.5501 KOps/s $\textbf{\color{#d91a1a}-6.19\%}$
test_select 86.4620μs 42.0242μs 23.7958 KOps/s 25.3316 KOps/s $\textbf{\color{#d91a1a}-6.06\%}$
test_select_nested 0.8926ms 59.8100μs 16.7196 KOps/s 16.7139 KOps/s $\color{#35bf28}+0.03\%$
test_exclude_nested 0.2283ms 0.1186ms 8.4328 KOps/s 8.4258 KOps/s $\color{#35bf28}+0.08\%$
test_empty[True] 0.6023ms 0.3861ms 2.5901 KOps/s 2.5397 KOps/s $\color{#35bf28}+1.98\%$
test_empty[False] 9.7350μs 1.1309μs 884.2665 KOps/s 871.4110 KOps/s $\color{#35bf28}+1.48\%$
test_unbind_speed 1.6078ms 0.2555ms 3.9145 KOps/s 3.9334 KOps/s $\color{#d91a1a}-0.48\%$
test_unbind_speed_stack0 0.4358ms 0.2441ms 4.0962 KOps/s 4.0782 KOps/s $\color{#35bf28}+0.44\%$
test_unbind_speed_stack1 64.2182ms 0.7050ms 1.4184 KOps/s 1.4020 KOps/s $\color{#35bf28}+1.17\%$
test_split 66.4568ms 1.6222ms 616.4485 Ops/s 620.4103 Ops/s $\color{#d91a1a}-0.64\%$
test_chunk 67.6416ms 1.6131ms 619.9430 Ops/s 616.6956 Ops/s $\color{#35bf28}+0.53\%$
test_creation[device0] 4.2427ms 85.5463μs 11.6896 KOps/s 11.9321 KOps/s $\color{#d91a1a}-2.03\%$
test_creation_from_tensor 0.2108ms 83.7878μs 11.9349 KOps/s 11.1795 KOps/s $\textbf{\color{#35bf28}+6.76\%}$
test_add_one[memmap_tensor0] 55.9850μs 5.3146μs 188.1617 KOps/s 178.9318 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_contiguous[memmap_tensor0] 17.9440μs 0.6400μs 1.5625 MOps/s 1.5561 MOps/s $\color{#35bf28}+0.41\%$
test_stack[memmap_tensor0] 28.1130μs 3.6374μs 274.9238 KOps/s 269.0608 KOps/s $\color{#35bf28}+2.18\%$
test_memmaptd_index 1.0878ms 0.2548ms 3.9243 KOps/s 3.6397 KOps/s $\textbf{\color{#35bf28}+7.82\%}$
test_memmaptd_index_astensor 1.0891ms 0.3403ms 2.9382 KOps/s 2.8662 KOps/s $\color{#35bf28}+2.51\%$
test_memmaptd_index_op 0.8573ms 0.6033ms 1.6576 KOps/s 1.6171 KOps/s $\color{#35bf28}+2.50\%$
test_serialize_model 0.1739s 0.1136s 8.8022 Ops/s 8.3622 Ops/s $\textbf{\color{#35bf28}+5.26\%}$
test_serialize_model_pickle 0.4504s 0.3793s 2.6364 Ops/s 2.5959 Ops/s $\color{#35bf28}+1.56\%$
test_serialize_weights 0.1635s 0.1108s 9.0250 Ops/s 9.4866 Ops/s $\color{#d91a1a}-4.87\%$
test_serialize_weights_returnearly 0.1337s 0.1263s 7.9164 Ops/s 7.8051 Ops/s $\color{#35bf28}+1.43\%$
test_serialize_weights_pickle 0.7864s 0.4999s 2.0003 Ops/s 2.3435 Ops/s $\textbf{\color{#d91a1a}-14.65\%}$
test_serialize_weights_filesystem 0.1042s 94.5626ms 10.5750 Ops/s 9.6310 Ops/s $\textbf{\color{#35bf28}+9.80\%}$
test_serialize_model_filesystem 0.1057s 95.0779ms 10.5177 Ops/s 9.8712 Ops/s $\textbf{\color{#35bf28}+6.55\%}$
test_reshape_pytree 64.1810μs 25.6557μs 38.9777 KOps/s 39.7740 KOps/s $\color{#d91a1a}-2.00\%$
test_reshape_td 72.9960μs 34.5494μs 28.9441 KOps/s 29.3242 KOps/s $\color{#d91a1a}-1.30\%$
test_view_pytree 73.5680μs 25.7074μs 38.8993 KOps/s 38.9279 KOps/s $\color{#d91a1a}-0.07\%$
test_view_td 84.3480μs 39.0933μs 25.5798 KOps/s 26.1358 KOps/s $\color{#d91a1a}-2.13\%$
test_unbind_pytree 74.7510μs 29.2836μs 34.1489 KOps/s 34.0258 KOps/s $\color{#35bf28}+0.36\%$
test_unbind_td 0.4406ms 38.0778μs 26.2620 KOps/s 26.2910 KOps/s $\color{#d91a1a}-0.11\%$
test_split_pytree 74.3800μs 29.2301μs 34.2113 KOps/s 34.1685 KOps/s $\color{#35bf28}+0.13\%$
test_split_td 0.5446ms 41.6443μs 24.0129 KOps/s 24.3745 KOps/s $\color{#d91a1a}-1.48\%$
test_add_pytree 74.1690μs 34.1372μs 29.2935 KOps/s 28.7738 KOps/s $\color{#35bf28}+1.81\%$
test_add_td 0.1694ms 54.3237μs 18.4082 KOps/s 19.2176 KOps/s $\color{#d91a1a}-4.21\%$
test_distributed 0.2686ms 0.1028ms 9.7316 KOps/s 9.5817 KOps/s $\color{#35bf28}+1.56\%$
test_tdmodule 89.1180μs 18.0129μs 55.5156 KOps/s 58.2319 KOps/s $\color{#d91a1a}-4.66\%$
test_tdmodule_dispatch 59.3210μs 36.1304μs 27.6775 KOps/s 29.8448 KOps/s $\textbf{\color{#d91a1a}-7.26\%}$
test_tdseq 35.7470μs 20.9245μs 47.7910 KOps/s 49.4033 KOps/s $\color{#d91a1a}-3.26\%$
test_tdseq_dispatch 67.3360μs 40.7489μs 24.5406 KOps/s 25.5801 KOps/s $\color{#d91a1a}-4.06\%$
test_instantiation_functorch 1.5976ms 1.3037ms 767.0422 Ops/s 767.1832 Ops/s $\color{#d91a1a}-0.02\%$
test_instantiation_td 2.3243ms 1.0273ms 973.4655 Ops/s 1.0042 KOps/s $\color{#d91a1a}-3.06\%$
test_exec_functorch 0.2779ms 0.1622ms 6.1645 KOps/s 6.3162 KOps/s $\color{#d91a1a}-2.40\%$
test_exec_functional_call 0.4981ms 0.1554ms 6.4338 KOps/s 6.7745 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_exec_td 0.2373ms 0.1482ms 6.7482 KOps/s 6.9466 KOps/s $\color{#d91a1a}-2.86\%$
test_exec_td_decorator 0.5520ms 0.2221ms 4.5015 KOps/s 4.5861 KOps/s $\color{#d91a1a}-1.85\%$
test_vmap_mlp_speed[True-True] 0.7659ms 0.4850ms 2.0620 KOps/s 2.0964 KOps/s $\color{#d91a1a}-1.64\%$
test_vmap_mlp_speed[True-False] 0.7231ms 0.4807ms 2.0802 KOps/s 2.0796 KOps/s $\color{#35bf28}+0.03\%$
test_vmap_mlp_speed[False-True] 0.9332ms 0.3980ms 2.5127 KOps/s 2.5547 KOps/s $\color{#d91a1a}-1.64\%$
test_vmap_mlp_speed[False-False] 0.6311ms 0.3932ms 2.5433 KOps/s 2.5647 KOps/s $\color{#d91a1a}-0.83\%$
test_vmap_mlp_speed_decorator[True-True] 1.1630ms 0.5559ms 1.7988 KOps/s 1.8217 KOps/s $\color{#d91a1a}-1.26\%$
test_vmap_mlp_speed_decorator[True-False] 1.0417ms 0.5561ms 1.7983 KOps/s 1.8131 KOps/s $\color{#d91a1a}-0.82\%$
test_vmap_mlp_speed_decorator[False-True] 73.6829ms 0.4917ms 2.0336 KOps/s 2.2181 KOps/s $\textbf{\color{#d91a1a}-8.32\%}$
test_vmap_mlp_speed_decorator[False-False] 0.8196ms 0.4584ms 2.1816 KOps/s 2.2163 KOps/s $\color{#d91a1a}-1.57\%$
test_to_module_speed[True] 2.3247ms 1.6853ms 593.3570 Ops/s 596.8026 Ops/s $\color{#d91a1a}-0.58\%$
test_to_module_speed[False] 2.5466ms 1.6524ms 605.1683 Ops/s 615.9915 Ops/s $\color{#d91a1a}-1.76\%$
test_tc_init 60.7440μs 29.0693μs 34.4006 KOps/s 38.5730 KOps/s $\textbf{\color{#d91a1a}-10.82\%}$
test_tc_init_nested 0.1081ms 60.4431μs 16.5445 KOps/s 18.0232 KOps/s $\textbf{\color{#d91a1a}-8.20\%}$
test_tc_first_layer_tensor 5.7980μs 0.6875μs 1.4546 MOps/s 1.4915 MOps/s $\color{#d91a1a}-2.48\%$
test_tc_first_layer_nontensor 3.7500μs 0.6625μs 1.5095 MOps/s 1.5271 MOps/s $\color{#d91a1a}-1.16\%$
test_tc_second_layer_tensor 27.0210μs 1.8430μs 542.6058 KOps/s 531.4231 KOps/s $\color{#35bf28}+2.10\%$
test_tc_second_layer_nontensor 11.5147μs 1.4926μs 669.9748 KOps/s 655.0214 KOps/s $\color{#35bf28}+2.28\%$
test_unbind 84.0940ms 7.1903ms 139.0756 Ops/s 138.9569 Ops/s $\color{#35bf28}+0.09\%$
test_full_like 15.0641ms 11.0905ms 90.1669 Ops/s 82.4350 Ops/s $\textbf{\color{#35bf28}+9.38\%}$
test_zeros_like 13.6548ms 6.2271ms 160.5880 Ops/s 158.0792 Ops/s $\color{#35bf28}+1.59\%$
test_ones_like 12.5541ms 6.4154ms 155.8758 Ops/s 156.6897 Ops/s $\color{#d91a1a}-0.52\%$
test_clone 12.9213ms 7.8849ms 126.8255 Ops/s 122.1047 Ops/s $\color{#35bf28}+3.87\%$
test_squeeze 59.2410μs 14.1720μs 70.5616 KOps/s 69.4176 KOps/s $\color{#35bf28}+1.65\%$
test_unsqueeze 0.1111ms 59.2101μs 16.8890 KOps/s 16.3427 KOps/s $\color{#35bf28}+3.34\%$
test_split 0.1895ms 0.1120ms 8.9266 KOps/s 8.8106 KOps/s $\color{#35bf28}+1.32\%$
test_permute 0.2098ms 0.1265ms 7.9081 KOps/s 7.9199 KOps/s $\color{#d91a1a}-0.15\%$
test_stack 30.7092ms 22.3760ms 44.6908 Ops/s 43.2088 Ops/s $\color{#35bf28}+3.43\%$
test_cat 27.0150ms 21.8183ms 45.8331 Ops/s 43.2972 Ops/s $\textbf{\color{#35bf28}+5.86\%}$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}20$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.4810μs 13.9177μs 71.8511 KOps/s 78.4397 KOps/s $\textbf{\color{#d91a1a}-8.40\%}$
test_plain_set_stack_nested 27.0200μs 14.0896μs 70.9741 KOps/s 77.4145 KOps/s $\textbf{\color{#d91a1a}-8.32\%}$
test_plain_set_nested_inplace 41.5510μs 15.2855μs 65.4214 KOps/s 71.1812 KOps/s $\textbf{\color{#d91a1a}-8.09\%}$
test_plain_set_stack_nested_inplace 45.0310μs 15.3874μs 64.9884 KOps/s 70.6457 KOps/s $\textbf{\color{#d91a1a}-8.01\%}$
test_items 20.2800μs 4.6680μs 214.2257 KOps/s 213.3319 KOps/s $\color{#35bf28}+0.42\%$
test_items_nested 0.3849ms 0.3386ms 2.9533 KOps/s 2.9640 KOps/s $\color{#d91a1a}-0.36\%$
test_items_nested_locked 0.4134ms 0.3580ms 2.7934 KOps/s 2.9068 KOps/s $\color{#d91a1a}-3.90\%$
test_items_nested_leaf 0.1029ms 83.5317μs 11.9715 KOps/s 12.1701 KOps/s $\color{#d91a1a}-1.63\%$
test_items_stack_nested 0.4015ms 0.3444ms 2.9036 KOps/s 2.9231 KOps/s $\color{#d91a1a}-0.67\%$
test_items_stack_nested_leaf 0.1046ms 83.9269μs 11.9151 KOps/s 12.0454 KOps/s $\color{#d91a1a}-1.08\%$
test_items_stack_nested_locked 0.3985ms 0.3479ms 2.8745 KOps/s 2.9457 KOps/s $\color{#d91a1a}-2.42\%$
test_keys 27.1110μs 4.3281μs 231.0463 KOps/s 231.6679 KOps/s $\color{#d91a1a}-0.27\%$
test_keys_nested 97.5120μs 67.5173μs 14.8110 KOps/s 15.0321 KOps/s $\color{#d91a1a}-1.47\%$
test_keys_nested_locked 2.1444ms 72.8003μs 13.7362 KOps/s 13.9657 KOps/s $\color{#d91a1a}-1.64\%$
test_keys_nested_leaf 88.3020μs 57.9070μs 17.2691 KOps/s 17.5004 KOps/s $\color{#d91a1a}-1.32\%$
test_keys_stack_nested 98.3230μs 66.9849μs 14.9287 KOps/s 15.1000 KOps/s $\color{#d91a1a}-1.13\%$
test_keys_stack_nested_leaf 80.9520μs 57.7424μs 17.3183 KOps/s 17.4902 KOps/s $\color{#d91a1a}-0.98\%$
test_keys_stack_nested_locked 96.6020μs 71.2927μs 14.0267 KOps/s 14.0269 KOps/s $-0.00\%$
test_values 8.4537μs 1.8081μs 553.0555 KOps/s 551.6554 KOps/s $\color{#35bf28}+0.25\%$
test_values_nested 76.6520μs 35.2021μs 28.4074 KOps/s 28.8201 KOps/s $\color{#d91a1a}-1.43\%$
test_values_nested_locked 60.0520μs 37.1427μs 26.9232 KOps/s 27.2735 KOps/s $\color{#d91a1a}-1.28\%$
test_values_nested_leaf 52.7510μs 31.2251μs 32.0256 KOps/s 32.4609 KOps/s $\color{#d91a1a}-1.34\%$
test_values_stack_nested 65.2420μs 35.3294μs 28.3051 KOps/s 28.1513 KOps/s $\color{#35bf28}+0.55\%$
test_values_stack_nested_leaf 51.0810μs 31.3850μs 31.8624 KOps/s 31.7536 KOps/s $\color{#35bf28}+0.34\%$
test_values_stack_nested_locked 62.8410μs 37.1482μs 26.9192 KOps/s 26.6749 KOps/s $\color{#35bf28}+0.92\%$
test_membership 35.6500μs 0.8329μs 1.2007 MOps/s 1.1891 MOps/s $\color{#35bf28}+0.98\%$
test_membership_nested 23.4800μs 2.5616μs 390.3767 KOps/s 392.1265 KOps/s $\color{#d91a1a}-0.45\%$
test_membership_nested_leaf 35.0910μs 2.5352μs 394.4439 KOps/s 387.8286 KOps/s $\color{#35bf28}+1.71\%$
test_membership_stacked_nested 21.4600μs 2.5638μs 390.0434 KOps/s 389.0416 KOps/s $\color{#35bf28}+0.26\%$
test_membership_stacked_nested_leaf 15.1510μs 2.5608μs 390.5057 KOps/s 391.0215 KOps/s $\color{#d91a1a}-0.13\%$
test_membership_nested_last 36.6910μs 3.0622μs 326.5669 KOps/s 325.8502 KOps/s $\color{#35bf28}+0.22\%$
test_membership_nested_leaf_last 20.6900μs 3.0836μs 324.2915 KOps/s 325.7912 KOps/s $\color{#d91a1a}-0.46\%$
test_membership_stacked_nested_last 44.8910μs 3.1276μs 319.7312 KOps/s 256.2564 KOps/s $\textbf{\color{#35bf28}+24.77\%}$
test_membership_stacked_nested_leaf_last 21.4100μs 3.1125μs 321.2847 KOps/s 258.1818 KOps/s $\textbf{\color{#35bf28}+24.44\%}$
test_nested_getleaf 40.8510μs 8.4285μs 118.6453 KOps/s 119.7160 KOps/s $\color{#d91a1a}-0.89\%$
test_nested_get 23.1610μs 7.9351μs 126.0226 KOps/s 127.5925 KOps/s $\color{#d91a1a}-1.23\%$
test_stacked_getleaf 26.9010μs 8.4274μs 118.6603 KOps/s 118.9262 KOps/s $\color{#d91a1a}-0.22\%$
test_stacked_get 38.8610μs 7.9332μs 126.0520 KOps/s 126.6363 KOps/s $\color{#d91a1a}-0.46\%$
test_nested_getitemleaf 24.9310μs 8.5799μs 116.5510 KOps/s 117.3601 KOps/s $\color{#d91a1a}-0.69\%$
test_nested_getitem 30.3310μs 8.1041μs 123.3950 KOps/s 124.4060 KOps/s $\color{#d91a1a}-0.81\%$
test_stacked_getitemleaf 35.2920μs 8.5714μs 116.6673 KOps/s 116.5532 KOps/s $\color{#35bf28}+0.10\%$
test_stacked_getitem 25.7700μs 8.0702μs 123.9121 KOps/s 123.6761 KOps/s $\color{#35bf28}+0.19\%$
test_lock_nested 58.9086ms 0.4060ms 2.4630 KOps/s 2.4927 KOps/s $\color{#d91a1a}-1.19\%$
test_lock_stack_nested 0.3550ms 0.3034ms 3.2963 KOps/s 3.3432 KOps/s $\color{#d91a1a}-1.40\%$
test_unlock_nested 60.9140ms 0.4109ms 2.4338 KOps/s 2.4542 KOps/s $\color{#d91a1a}-0.83\%$
test_unlock_stack_nested 0.3505ms 0.3114ms 3.2110 KOps/s 3.2322 KOps/s $\color{#d91a1a}-0.65\%$
test_flatten_speed 0.2958ms 0.1007ms 9.9261 KOps/s 9.9233 KOps/s $\color{#35bf28}+0.03\%$
test_unflatten_speed 0.3222ms 0.2917ms 3.4283 KOps/s 3.4224 KOps/s $\color{#35bf28}+0.17\%$
test_common_ops 1.0883ms 0.6218ms 1.6083 KOps/s 1.6906 KOps/s $\color{#d91a1a}-4.87\%$
test_creation 27.0310μs 1.6426μs 608.7851 KOps/s 613.3372 KOps/s $\color{#d91a1a}-0.74\%$
test_creation_empty 24.3210μs 10.7467μs 93.0516 KOps/s 118.1474 KOps/s $\textbf{\color{#d91a1a}-21.24\%}$
test_creation_nested_1 40.4210μs 12.4170μs 80.5349 KOps/s 98.0529 KOps/s $\textbf{\color{#d91a1a}-17.87\%}$
test_creation_nested_2 29.8110μs 14.5464μs 68.7456 KOps/s 79.3599 KOps/s $\textbf{\color{#d91a1a}-13.37\%}$
test_clone 69.5910μs 11.8425μs 84.4414 KOps/s 84.1937 KOps/s $\color{#35bf28}+0.29\%$
test_getitem[int] 30.5110μs 11.3660μs 87.9813 KOps/s 87.8079 KOps/s $\color{#35bf28}+0.20\%$
test_getitem[slice_int] 55.5320μs 21.4819μs 46.5508 KOps/s 42.4434 KOps/s $\textbf{\color{#35bf28}+9.68\%}$
test_getitem[range] 69.2020μs 49.9852μs 20.0059 KOps/s 20.1339 KOps/s $\color{#d91a1a}-0.64\%$
test_getitem[tuple] 43.1510μs 19.4222μs 51.4875 KOps/s 51.4988 KOps/s $\color{#d91a1a}-0.02\%$
test_getitem[list] 0.1272ms 34.8218μs 28.7176 KOps/s 28.0672 KOps/s $\color{#35bf28}+2.32\%$
test_setitem_dim[int] 49.1510μs 32.5975μs 30.6772 KOps/s 31.6787 KOps/s $\color{#d91a1a}-3.16\%$
test_setitem_dim[slice_int] 77.4010μs 52.6640μs 18.9883 KOps/s 18.4840 KOps/s $\color{#35bf28}+2.73\%$
test_setitem_dim[range] 94.5820μs 72.5850μs 13.7770 KOps/s 13.5893 KOps/s $\color{#35bf28}+1.38\%$
test_setitem_dim[tuple] 63.7110μs 46.0098μs 21.7345 KOps/s 20.9183 KOps/s $\color{#35bf28}+3.90\%$
test_setitem 42.9010μs 18.2973μs 54.6529 KOps/s 56.4646 KOps/s $\color{#d91a1a}-3.21\%$
test_set 59.9520μs 17.5518μs 56.9742 KOps/s 58.8458 KOps/s $\color{#d91a1a}-3.18\%$
test_set_shared 1.3331ms 99.2628μs 10.0743 KOps/s 9.9754 KOps/s $\color{#35bf28}+0.99\%$
test_update 86.0220μs 21.1315μs 47.3228 KOps/s 52.9944 KOps/s $\textbf{\color{#d91a1a}-10.70\%}$
test_update_nested 74.3520μs 25.8479μs 38.6879 KOps/s 39.9466 KOps/s $\color{#d91a1a}-3.15\%$
test_update__nested 60.7120μs 22.7838μs 43.8909 KOps/s 44.1024 KOps/s $\color{#d91a1a}-0.48\%$
test_set_nested 68.3210μs 18.3292μs 54.5577 KOps/s 54.5015 KOps/s $\color{#35bf28}+0.10\%$
test_set_nested_new 61.7810μs 21.8255μs 45.8180 KOps/s 47.1293 KOps/s $\color{#d91a1a}-2.78\%$
test_select 68.2720μs 35.2609μs 28.3600 KOps/s 27.9896 KOps/s $\color{#35bf28}+1.32\%$
test_select_nested 0.4820ms 53.7361μs 18.6094 KOps/s 18.2997 KOps/s $\color{#35bf28}+1.69\%$
test_exclude_nested 0.1352ms 0.1083ms 9.2298 KOps/s 8.9532 KOps/s $\color{#35bf28}+3.09\%$
test_empty[True] 0.3751ms 0.3480ms 2.8733 KOps/s 2.8761 KOps/s $\color{#d91a1a}-0.10\%$
test_empty[False] 2.8740μs 0.9148μs 1.0932 MOps/s 1.0714 MOps/s $\color{#35bf28}+2.03\%$
test_to 0.1048ms 77.0819μs 12.9732 KOps/s 12.4816 KOps/s $\color{#35bf28}+3.94\%$
test_to_nonblocking 98.1120μs 61.7724μs 16.1885 KOps/s 15.1592 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_unbind_speed 1.3929ms 0.2644ms 3.7821 KOps/s 3.7912 KOps/s $\color{#d91a1a}-0.24\%$
test_unbind_speed_stack0 0.3475ms 0.2632ms 3.7999 KOps/s 3.8089 KOps/s $\color{#d91a1a}-0.24\%$
test_unbind_speed_stack1 76.4737ms 0.8030ms 1.2454 KOps/s 1.2364 KOps/s $\color{#35bf28}+0.73\%$
test_split 76.7957ms 1.7247ms 579.8050 Ops/s 560.2051 Ops/s $\color{#35bf28}+3.50\%$
test_chunk 76.7287ms 1.7192ms 581.6600 Ops/s 601.8766 Ops/s $\color{#d91a1a}-3.36\%$
test_creation[device0] 0.1188ms 59.8376μs 16.7119 KOps/s 16.1544 KOps/s $\color{#35bf28}+3.45\%$
test_creation_from_tensor 0.1292ms 53.9506μs 18.5355 KOps/s 17.2421 KOps/s $\textbf{\color{#35bf28}+7.50\%}$
test_add_one[memmap_tensor0] 83.9920μs 7.0275μs 142.2991 KOps/s 143.6064 KOps/s $\color{#d91a1a}-0.91\%$
test_contiguous[memmap_tensor0] 24.5300μs 0.7306μs 1.3687 MOps/s 1.4819 MOps/s $\textbf{\color{#d91a1a}-7.64\%}$
test_stack[memmap_tensor0] 25.7110μs 5.2897μs 189.0464 KOps/s 196.2193 KOps/s $\color{#d91a1a}-3.66\%$
test_memmaptd_index 1.1726ms 0.3081ms 3.2459 KOps/s 3.3410 KOps/s $\color{#d91a1a}-2.85\%$
test_memmaptd_index_astensor 0.6411ms 0.3783ms 2.6432 KOps/s 2.6938 KOps/s $\color{#d91a1a}-1.88\%$
test_memmaptd_index_op 1.2339ms 0.7168ms 1.3951 KOps/s 1.4882 KOps/s $\textbf{\color{#d91a1a}-6.26\%}$
test_serialize_model 0.1829s 0.1111s 8.9977 Ops/s 9.4719 Ops/s $\textbf{\color{#d91a1a}-5.01\%}$
test_serialize_model_pickle 1.3492s 1.2352s 0.8096 Ops/s 0.8063 Ops/s $\color{#35bf28}+0.41\%$
test_serialize_weights 0.1805s 0.1087s 9.2024 Ops/s 8.7308 Ops/s $\textbf{\color{#35bf28}+5.40\%}$
test_serialize_weights_returnearly 0.2893s 0.1054s 9.4904 Ops/s 10.3213 Ops/s $\textbf{\color{#d91a1a}-8.05\%}$
test_serialize_weights_pickle 1.3540s 1.2481s 0.8012 Ops/s 0.8090 Ops/s $\color{#d91a1a}-0.95\%$
test_reshape_pytree 62.4510μs 26.3339μs 37.9738 KOps/s 37.8911 KOps/s $\color{#35bf28}+0.22\%$
test_reshape_td 55.7210μs 31.4901μs 31.7560 KOps/s 30.8431 KOps/s $\color{#35bf28}+2.96\%$
test_view_pytree 0.2226ms 26.0499μs 38.3879 KOps/s 38.3877 KOps/s $+0.00\%$
test_view_td 62.9510μs 35.6965μs 28.0139 KOps/s 27.1950 KOps/s $\color{#35bf28}+3.01\%$
test_unbind_pytree 0.2277ms 32.1874μs 31.0681 KOps/s 31.5692 KOps/s $\color{#d91a1a}-1.59\%$
test_unbind_td 0.4042ms 40.2922μs 24.8187 KOps/s 24.1202 KOps/s $\color{#35bf28}+2.90\%$
test_split_pytree 0.2568ms 35.5976μs 28.0918 KOps/s 28.0195 KOps/s $\color{#35bf28}+0.26\%$
test_split_td 0.5216ms 42.4498μs 23.5572 KOps/s 24.8017 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_add_pytree 0.2639ms 38.5482μs 25.9416 KOps/s 26.0794 KOps/s $\color{#d91a1a}-0.53\%$
test_add_td 86.3220μs 54.1325μs 18.4732 KOps/s 19.2682 KOps/s $\color{#d91a1a}-4.13\%$
test_distributed 0.2494ms 66.9170μs 14.9439 KOps/s 15.2714 KOps/s $\color{#d91a1a}-2.14\%$
test_tdmodule 0.1267ms 16.5538μs 60.4091 KOps/s 67.8243 KOps/s $\textbf{\color{#d91a1a}-10.93\%}$
test_tdmodule_dispatch 46.8910μs 32.0426μs 31.2084 KOps/s 34.4132 KOps/s $\textbf{\color{#d91a1a}-9.31\%}$
test_tdseq 44.7210μs 17.9228μs 55.7950 KOps/s 59.8802 KOps/s $\textbf{\color{#d91a1a}-6.82\%}$
test_tdseq_dispatch 49.9510μs 34.4276μs 29.0464 KOps/s 30.8078 KOps/s $\textbf{\color{#d91a1a}-5.72\%}$
test_instantiation_functorch 1.8306ms 1.5600ms 641.0135 Ops/s 654.4163 Ops/s $\color{#d91a1a}-2.05\%$
test_instantiation_td 1.5695ms 1.0448ms 957.1328 Ops/s 871.5047 Ops/s $\textbf{\color{#35bf28}+9.83\%}$
test_exec_functorch 0.2040ms 0.1501ms 6.6633 KOps/s 6.6879 KOps/s $\color{#d91a1a}-0.37\%$
test_exec_functional_call 0.3543ms 0.1368ms 7.3083 KOps/s 7.3586 KOps/s $\color{#d91a1a}-0.68\%$
test_exec_td 0.1829ms 0.1352ms 7.3959 KOps/s 7.4248 KOps/s $\color{#d91a1a}-0.39\%$
test_exec_td_decorator 0.7681ms 0.2083ms 4.7998 KOps/s 4.7627 KOps/s $\color{#35bf28}+0.78\%$
test_vmap_mlp_speed[True-True] 1.2742ms 0.5850ms 1.7093 KOps/s 1.7369 KOps/s $\color{#d91a1a}-1.59\%$
test_vmap_mlp_speed[True-False] 0.8287ms 0.5903ms 1.6941 KOps/s 1.7436 KOps/s $\color{#d91a1a}-2.84\%$
test_vmap_mlp_speed[False-True] 1.0304ms 0.5239ms 1.9087 KOps/s 1.9680 KOps/s $\color{#d91a1a}-3.01\%$
test_vmap_mlp_speed[False-False] 0.9874ms 0.5252ms 1.9039 KOps/s 1.9795 KOps/s $\color{#d91a1a}-3.82\%$
test_vmap_mlp_speed_decorator[True-True] 1.0547ms 0.6602ms 1.5147 KOps/s 1.5697 KOps/s $\color{#d91a1a}-3.51\%$
test_vmap_mlp_speed_decorator[True-False] 0.7272ms 0.6415ms 1.5588 KOps/s 1.5768 KOps/s $\color{#d91a1a}-1.14\%$
test_vmap_mlp_speed_decorator[False-True] 0.7458ms 0.5643ms 1.7722 KOps/s 1.7742 KOps/s $\color{#d91a1a}-0.11\%$
test_vmap_mlp_speed_decorator[False-False] 0.6746ms 0.5634ms 1.7750 KOps/s 1.7690 KOps/s $\color{#35bf28}+0.34\%$
test_vmap_transformer_speed[True-True] 7.5780ms 7.4922ms 133.4719 Ops/s 133.6101 Ops/s $\color{#d91a1a}-0.10\%$
test_vmap_transformer_speed[True-False] 7.5148ms 7.4610ms 134.0300 Ops/s 133.9863 Ops/s $\color{#35bf28}+0.03\%$
test_vmap_transformer_speed[False-True] 8.3322ms 7.7318ms 129.3363 Ops/s 135.1195 Ops/s $\color{#d91a1a}-4.28\%$
test_vmap_transformer_speed[False-False] 8.2238ms 7.5665ms 132.1613 Ops/s 135.0375 Ops/s $\color{#d91a1a}-2.13\%$
test_vmap_transformer_speed_decorator[True-True] 19.2067ms 18.5875ms 53.7997 Ops/s 55.0891 Ops/s $\color{#d91a1a}-2.34\%$
test_vmap_transformer_speed_decorator[True-False] 19.1910ms 18.5641ms 53.8674 Ops/s 55.2064 Ops/s $\color{#d91a1a}-2.43\%$
test_vmap_transformer_speed_decorator[False-True] 19.0537ms 18.5361ms 53.9487 Ops/s 55.5535 Ops/s $\color{#d91a1a}-2.89\%$
test_vmap_transformer_speed_decorator[False-False] 19.3136ms 18.5350ms 53.9520 Ops/s 55.5200 Ops/s $\color{#d91a1a}-2.82\%$
test_to_module_speed[True] 3.0124ms 1.5801ms 632.8640 Ops/s 649.9024 Ops/s $\color{#d91a1a}-2.62\%$
test_to_module_speed[False] 2.0327ms 1.5369ms 650.6762 Ops/s 650.4422 Ops/s $\color{#35bf28}+0.04\%$
test_tc_init 0.1712ms 29.6085μs 33.7741 KOps/s 39.9654 KOps/s $\textbf{\color{#d91a1a}-15.49\%}$
test_tc_init_nested 0.1947ms 64.5511μs 15.4916 KOps/s 18.0664 KOps/s $\textbf{\color{#d91a1a}-14.25\%}$
test_tc_first_layer_tensor 3.2818μs 0.3619μs 2.7632 MOps/s 2.7878 MOps/s $\color{#d91a1a}-0.88\%$
test_tc_first_layer_nontensor 10.6518μs 0.3921μs 2.5506 MOps/s 2.5242 MOps/s $\color{#35bf28}+1.05\%$
test_tc_second_layer_tensor 25.5326μs 0.9722μs 1.0286 MOps/s 938.7142 KOps/s $\textbf{\color{#35bf28}+9.57\%}$
test_tc_second_layer_nontensor 21.7538μs 0.8411μs 1.1890 MOps/s 1.2252 MOps/s $\color{#d91a1a}-2.96\%$
test_unbind 0.1126s 6.9687ms 143.4989 Ops/s 157.2436 Ops/s $\textbf{\color{#d91a1a}-8.74\%}$
test_full_like 11.7878ms 11.1462ms 89.7165 Ops/s 76.4800 Ops/s $\textbf{\color{#35bf28}+17.31\%}$
test_zeros_like 8.1528ms 7.8247ms 127.8011 Ops/s 127.3165 Ops/s $\color{#35bf28}+0.38\%$
test_ones_like 8.4440ms 7.8740ms 127.0007 Ops/s 126.3744 Ops/s $\color{#35bf28}+0.50\%$
test_clone 9.3933ms 9.2096ms 108.5826 Ops/s 108.2504 Ops/s $\color{#35bf28}+0.31\%$
test_squeeze 69.0610μs 10.9372μs 91.4310 KOps/s 91.2922 KOps/s $\color{#35bf28}+0.15\%$
test_unsqueeze 97.9430μs 52.8971μs 18.9046 KOps/s 18.8525 KOps/s $\color{#35bf28}+0.28\%$
test_split 0.1563ms 99.2463μs 10.0759 KOps/s 10.1357 KOps/s $\color{#d91a1a}-0.59\%$
test_permute 0.1737ms 0.1099ms 9.0990 KOps/s 8.9970 KOps/s $\color{#35bf28}+1.13\%$
test_stack 27.2443ms 26.6603ms 37.5089 Ops/s 37.3433 Ops/s $\color{#35bf28}+0.44\%$
test_cat 26.8743ms 26.5956ms 37.6002 Ops/s 37.5476 Ops/s $\color{#35bf28}+0.14\%$

@vmoens vmoens merged commit 1de6fb6 into main Jun 14, 2024
36 of 38 checks passed
@vmoens vmoens deleted the lock-only-once branch October 21, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants