Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Loading phantom state-dicts #650

Merged
merged 7 commits into from
Feb 2, 2024
Merged

[BugFix] Loading phantom state-dicts #650

merged 7 commits into from
Feb 2, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 1, 2024

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 1, 2024
Copy link

github-actions bot commented Feb 1, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 124. Improved: $\large\color{#35bf28}26$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.1070μs 15.3997μs 64.9364 KOps/s 57.6540 KOps/s $\textbf{\color{#35bf28}+12.63\%}$
test_plain_set_stack_nested 0.2083ms 0.1422ms 7.0327 KOps/s 6.7963 KOps/s $\color{#35bf28}+3.48\%$
test_plain_set_nested_inplace 66.2530μs 17.7386μs 56.3741 KOps/s 50.7977 KOps/s $\textbf{\color{#35bf28}+10.98\%}$
test_plain_set_stack_nested_inplace 0.3381ms 0.1733ms 5.7714 KOps/s 5.5452 KOps/s $\color{#35bf28}+4.08\%$
test_items 17.8240μs 2.3730μs 421.4021 KOps/s 406.2989 KOps/s $\color{#35bf28}+3.72\%$
test_items_nested 0.4529ms 0.2712ms 3.6874 KOps/s 3.6569 KOps/s $\color{#35bf28}+0.83\%$
test_items_nested_locked 0.7273ms 0.2723ms 3.6728 KOps/s 3.6363 KOps/s $\color{#35bf28}+1.00\%$
test_items_nested_leaf 0.5446ms 0.1713ms 5.8388 KOps/s 5.9317 KOps/s $\color{#d91a1a}-1.57\%$
test_items_stack_nested 1.7556ms 1.3464ms 742.7070 Ops/s 745.0091 Ops/s $\color{#d91a1a}-0.31\%$
test_items_stack_nested_leaf 4.8543ms 1.4589ms 685.4553 Ops/s 834.2038 Ops/s $\textbf{\color{#d91a1a}-17.83\%}$
test_items_stack_nested_locked 1.5384ms 0.8874ms 1.1268 KOps/s 1.1414 KOps/s $\color{#d91a1a}-1.27\%$
test_keys 20.5380μs 3.8601μs 259.0595 KOps/s 257.4334 KOps/s $\color{#35bf28}+0.63\%$
test_keys_nested 1.7948ms 0.1463ms 6.8349 KOps/s 6.7470 KOps/s $\color{#35bf28}+1.30\%$
test_keys_nested_locked 0.3382ms 0.1498ms 6.6767 KOps/s 6.5557 KOps/s $\color{#35bf28}+1.85\%$
test_keys_nested_leaf 0.2541ms 0.1295ms 7.7201 KOps/s 7.6854 KOps/s $\color{#35bf28}+0.45\%$
test_keys_stack_nested 3.5659ms 1.2946ms 772.4101 Ops/s 781.8222 Ops/s $\color{#d91a1a}-1.20\%$
test_keys_stack_nested_leaf 1.8824ms 1.2720ms 786.1940 Ops/s 784.9054 Ops/s $\color{#35bf28}+0.16\%$
test_keys_stack_nested_locked 1.0817ms 0.8127ms 1.2304 KOps/s 1.2316 KOps/s $\color{#d91a1a}-0.10\%$
test_values 4.8050μs 1.1354μs 880.7628 KOps/s 852.8253 KOps/s $\color{#35bf28}+3.28\%$
test_values_nested 0.1240ms 51.6695μs 19.3538 KOps/s 19.3629 KOps/s $\color{#d91a1a}-0.05\%$
test_values_nested_locked 0.1161ms 51.7638μs 19.3185 KOps/s 19.2132 KOps/s $\color{#35bf28}+0.55\%$
test_values_nested_leaf 0.1544ms 46.6591μs 21.4320 KOps/s 21.6396 KOps/s $\color{#d91a1a}-0.96\%$
test_values_stack_nested 1.2131ms 1.0279ms 972.8242 Ops/s 959.5536 Ops/s $\color{#35bf28}+1.38\%$
test_values_stack_nested_leaf 1.2596ms 1.0278ms 972.9877 Ops/s 966.7833 Ops/s $\color{#35bf28}+0.64\%$
test_values_stack_nested_locked 1.0555ms 0.6094ms 1.6409 KOps/s 1.6045 KOps/s $\color{#35bf28}+2.26\%$
test_membership 20.2080μs 1.3121μs 762.1369 KOps/s 741.2052 KOps/s $\color{#35bf28}+2.82\%$
test_membership_nested 60.7800μs 3.7483μs 266.7843 KOps/s 293.4326 KOps/s $\textbf{\color{#d91a1a}-9.08\%}$
test_membership_nested_leaf 51.9760μs 3.3970μs 294.3757 KOps/s 290.9951 KOps/s $\color{#35bf28}+1.16\%$
test_membership_stacked_nested 42.4990μs 11.6339μs 85.9555 KOps/s 84.1867 KOps/s $\color{#35bf28}+2.10\%$
test_membership_stacked_nested_leaf 39.0630μs 11.6119μs 86.1187 KOps/s 84.7754 KOps/s $\color{#35bf28}+1.58\%$
test_membership_nested_last 28.0220μs 6.5082μs 153.6522 KOps/s 148.1465 KOps/s $\color{#35bf28}+3.72\%$
test_membership_nested_leaf_last 30.9180μs 6.6481μs 150.4183 KOps/s 144.2840 KOps/s $\color{#35bf28}+4.25\%$
test_membership_stacked_nested_last 0.3385ms 0.1810ms 5.5245 KOps/s 5.7653 KOps/s $\color{#d91a1a}-4.18\%$
test_membership_stacked_nested_leaf_last 42.0680μs 13.8527μs 72.1883 KOps/s 72.0026 KOps/s $\color{#35bf28}+0.26\%$
test_nested_getleaf 30.6770μs 10.3187μs 96.9112 KOps/s 93.1747 KOps/s $\color{#35bf28}+4.01\%$
test_nested_get 30.8580μs 9.7941μs 102.1027 KOps/s 98.8427 KOps/s $\color{#35bf28}+3.30\%$
test_stacked_getleaf 0.8139ms 0.3965ms 2.5224 KOps/s 2.5158 KOps/s $\color{#35bf28}+0.26\%$
test_stacked_get 0.5061ms 0.3652ms 2.7382 KOps/s 2.6963 KOps/s $\color{#35bf28}+1.56\%$
test_nested_getitemleaf 0.1486ms 12.1658μs 82.1979 KOps/s 82.7341 KOps/s $\color{#d91a1a}-0.65\%$
test_nested_getitem 41.2370μs 11.3120μs 88.4018 KOps/s 86.1680 KOps/s $\color{#35bf28}+2.59\%$
test_stacked_getitemleaf 0.5950ms 0.3993ms 2.5043 KOps/s 2.4330 KOps/s $\color{#35bf28}+2.93\%$
test_stacked_getitem 0.6079ms 0.3697ms 2.7047 KOps/s 2.6347 KOps/s $\color{#35bf28}+2.66\%$
test_lock_nested 2.9715ms 0.3319ms 3.0128 KOps/s 2.9213 KOps/s $\color{#35bf28}+3.13\%$
test_lock_stack_nested 90.6330ms 6.1245ms 163.2792 Ops/s 165.9809 Ops/s $\color{#d91a1a}-1.63\%$
test_unlock_nested 84.5801ms 0.4166ms 2.4005 KOps/s 2.9199 KOps/s $\textbf{\color{#d91a1a}-17.79\%}$
test_unlock_stack_nested 84.3921ms 6.1999ms 161.2941 Ops/s 163.2436 Ops/s $\color{#d91a1a}-1.19\%$
test_flatten_speed 0.6857ms 0.3615ms 2.7659 KOps/s 2.7223 KOps/s $\color{#35bf28}+1.60\%$
test_unflatten_speed 0.6637ms 0.4525ms 2.2099 KOps/s 2.1697 KOps/s $\color{#35bf28}+1.85\%$
test_common_ops 6.0850ms 0.6288ms 1.5902 KOps/s 1.4248 KOps/s $\textbf{\color{#35bf28}+11.61\%}$
test_creation 18.4440μs 1.8072μs 553.3365 KOps/s 549.3214 KOps/s $\color{#35bf28}+0.73\%$
test_creation_empty 25.7080μs 7.7127μs 129.6564 KOps/s 94.3398 KOps/s $\textbf{\color{#35bf28}+37.44\%}$
test_creation_nested_1 31.2190μs 10.2332μs 97.7207 KOps/s 77.2626 KOps/s $\textbf{\color{#35bf28}+26.48\%}$
test_creation_nested_2 44.3320μs 13.6766μs 73.1176 KOps/s 60.7279 KOps/s $\textbf{\color{#35bf28}+20.40\%}$
test_clone 85.2890μs 13.1123μs 76.2642 KOps/s 76.6256 KOps/s $\color{#d91a1a}-0.47\%$
test_getitem[int] 34.8950μs 11.1518μs 89.6717 KOps/s 89.0931 KOps/s $\color{#35bf28}+0.65\%$
test_getitem[slice_int] 54.2910μs 22.2255μs 44.9934 KOps/s 44.6109 KOps/s $\color{#35bf28}+0.86\%$
test_getitem[range] 0.1078ms 42.3492μs 23.6132 KOps/s 23.0707 KOps/s $\color{#35bf28}+2.35\%$
test_getitem[tuple] 50.3040μs 18.3394μs 54.5274 KOps/s 53.6873 KOps/s $\color{#35bf28}+1.56\%$
test_getitem[list] 0.1423ms 36.6505μs 27.2848 KOps/s 26.3104 KOps/s $\color{#35bf28}+3.70\%$
test_setitem_dim[int] 72.4950μs 28.0286μs 35.6779 KOps/s 33.3805 KOps/s $\textbf{\color{#35bf28}+6.88\%}$
test_setitem_dim[slice_int] 0.1228ms 54.5094μs 18.3455 KOps/s 17.7621 KOps/s $\color{#35bf28}+3.28\%$
test_setitem_dim[range] 0.1227ms 70.2837μs 14.2280 KOps/s 13.1414 KOps/s $\textbf{\color{#35bf28}+8.27\%}$
test_setitem_dim[tuple] 79.8990μs 42.6598μs 23.4413 KOps/s 21.6096 KOps/s $\textbf{\color{#35bf28}+8.48\%}$
test_setitem 58.2890μs 18.3657μs 54.4493 KOps/s 50.5215 KOps/s $\textbf{\color{#35bf28}+7.77\%}$
test_set 53.7000μs 17.6572μs 56.6342 KOps/s 51.5048 KOps/s $\textbf{\color{#35bf28}+9.96\%}$
test_set_shared 3.6681ms 0.1424ms 7.0204 KOps/s 7.0450 KOps/s $\color{#d91a1a}-0.35\%$
test_update 81.8420μs 19.3685μs 51.6301 KOps/s 44.3961 KOps/s $\textbf{\color{#35bf28}+16.29\%}$
test_update_nested 0.1280ms 27.0327μs 36.9922 KOps/s 32.8842 KOps/s $\textbf{\color{#35bf28}+12.49\%}$
test_set_nested 49.6530μs 19.7317μs 50.6799 KOps/s 47.7561 KOps/s $\textbf{\color{#35bf28}+6.12\%}$
test_set_nested_new 69.3290μs 23.5702μs 42.4264 KOps/s 40.5853 KOps/s $\color{#35bf28}+4.54\%$
test_select 80.0190μs 36.3393μs 27.5184 KOps/s 26.2569 KOps/s $\color{#35bf28}+4.80\%$
test_select_nested 0.1072ms 56.5996μs 17.6680 KOps/s 17.2677 KOps/s $\color{#35bf28}+2.32\%$
test_exclude_nested 0.2284ms 0.1175ms 8.5131 KOps/s 8.5138 KOps/s $-0.01\%$
test_empty[True] 0.5044ms 0.4157ms 2.4058 KOps/s 2.4282 KOps/s $\color{#d91a1a}-0.92\%$
test_empty[False] 8.5118μs 1.0315μs 969.4412 KOps/s 979.7722 KOps/s $\color{#d91a1a}-1.05\%$
test_unbind_speed 0.2942ms 0.2456ms 4.0718 KOps/s 3.9615 KOps/s $\color{#35bf28}+2.78\%$
test_unbind_speed_stack0 82.8331ms 3.4176ms 292.6052 Ops/s 320.3773 Ops/s $\textbf{\color{#d91a1a}-8.67\%}$
test_unbind_speed_stack1 18.3940μs 1.9672μs 508.3297 KOps/s 497.7649 KOps/s $\color{#35bf28}+2.12\%$
test_split 76.2227ms 1.6267ms 614.7272 Ops/s 603.4240 Ops/s $\color{#35bf28}+1.87\%$
test_chunk 74.7274ms 1.5545ms 643.2773 Ops/s 636.8281 Ops/s $\color{#35bf28}+1.01\%$
test_creation[device0] 0.1879ms 0.1010ms 9.8981 KOps/s 9.5856 KOps/s $\color{#35bf28}+3.26\%$
test_creation_from_tensor 3.1671ms 81.9858μs 12.1972 KOps/s 12.2658 KOps/s $\color{#d91a1a}-0.56\%$
test_add_one[memmap_tensor0] 0.1676ms 5.2212μs 191.5266 KOps/s 187.1666 KOps/s $\color{#35bf28}+2.33\%$
test_contiguous[memmap_tensor0] 11.9220μs 0.6410μs 1.5600 MOps/s 1.5266 MOps/s $\color{#35bf28}+2.18\%$
test_stack[memmap_tensor0] 66.2130μs 3.5991μs 277.8509 KOps/s 279.1750 KOps/s $\color{#d91a1a}-0.47\%$
test_memmaptd_index 1.0512ms 0.2393ms 4.1791 KOps/s 4.1486 KOps/s $\color{#35bf28}+0.73\%$
test_memmaptd_index_astensor 0.7365ms 0.3018ms 3.3131 KOps/s 3.3450 KOps/s $\color{#d91a1a}-0.95\%$
test_memmaptd_index_op 0.7874ms 0.5491ms 1.8211 KOps/s 1.6588 KOps/s $\textbf{\color{#35bf28}+9.78\%}$
test_serialize_model 0.1702s 0.1071s 9.3394 Ops/s 8.8012 Ops/s $\textbf{\color{#35bf28}+6.12\%}$
test_serialize_model_pickle 0.4472s 0.3737s 2.6759 Ops/s 2.6276 Ops/s $\color{#35bf28}+1.84\%$
test_serialize_weights 0.1712s 0.1073s 9.3197 Ops/s 9.1405 Ops/s $\color{#35bf28}+1.96\%$
test_serialize_weights_returnearly 0.1280s 0.1225s 8.1624 Ops/s 7.9617 Ops/s $\color{#35bf28}+2.52\%$
test_serialize_weights_pickle 1.2493s 0.5844s 1.7111 Ops/s 2.3637 Ops/s $\textbf{\color{#d91a1a}-27.61\%}$
test_serialize_weights_filesystem 0.1034s 93.3175ms 10.7161 Ops/s 9.5784 Ops/s $\textbf{\color{#35bf28}+11.88\%}$
test_serialize_model_filesystem 0.1003s 94.2655ms 10.6083 Ops/s 10.5627 Ops/s $\color{#35bf28}+0.43\%$
test_reshape_pytree 57.4570μs 21.0891μs 47.4178 KOps/s 48.4781 KOps/s $\color{#d91a1a}-2.19\%$
test_reshape_td 63.4380μs 30.2950μs 33.0088 KOps/s 33.1157 KOps/s $\color{#d91a1a}-0.32\%$
test_view_pytree 63.3280μs 20.7465μs 48.2009 KOps/s 48.2829 KOps/s $\color{#d91a1a}-0.17\%$
test_view_td 72.1626ms 11.1001μs 90.0892 KOps/s 85.0758 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_unbind_pytree 52.9990μs 24.0042μs 41.6594 KOps/s 41.7489 KOps/s $\color{#d91a1a}-0.21\%$
test_unbind_td 0.1240ms 36.1929μs 27.6297 KOps/s 27.4620 KOps/s $\color{#35bf28}+0.61\%$
test_split_pytree 61.1840μs 24.0620μs 41.5593 KOps/s 42.4510 KOps/s $\color{#d91a1a}-2.10\%$
test_split_td 0.1233ms 40.2763μs 24.8285 KOps/s 24.9334 KOps/s $\color{#d91a1a}-0.42\%$
test_add_pytree 73.8880μs 30.1797μs 33.1348 KOps/s 33.3143 KOps/s $\color{#d91a1a}-0.54\%$
test_add_td 0.1064ms 46.6963μs 21.4150 KOps/s 18.8104 KOps/s $\textbf{\color{#35bf28}+13.85\%}$
test_distributed 0.1707ms 99.4295μs 10.0574 KOps/s 9.9045 KOps/s $\color{#35bf28}+1.54\%$
test_tdmodule 98.6030μs 20.8240μs 48.0216 KOps/s 44.4429 KOps/s $\textbf{\color{#35bf28}+8.05\%}$
test_tdmodule_dispatch 0.1996ms 39.7940μs 25.1294 KOps/s 22.1888 KOps/s $\textbf{\color{#35bf28}+13.25\%}$
test_tdseq 49.3820μs 24.1056μs 41.4841 KOps/s 39.0843 KOps/s $\textbf{\color{#35bf28}+6.14\%}$
test_tdseq_dispatch 0.1488ms 44.8202μs 22.3114 KOps/s 20.8700 KOps/s $\textbf{\color{#35bf28}+6.91\%}$
test_instantiation_functorch 1.5095ms 1.3192ms 758.0441 Ops/s 763.3082 Ops/s $\color{#d91a1a}-0.69\%$
test_instantiation_td 99.4919ms 1.1224ms 890.9110 Ops/s 883.9662 Ops/s $\color{#35bf28}+0.79\%$
test_exec_functorch 0.3045ms 0.1566ms 6.3854 KOps/s 6.4251 KOps/s $\color{#d91a1a}-0.62\%$
test_exec_functional_call 0.2635ms 0.1454ms 6.8763 KOps/s 7.0017 KOps/s $\color{#d91a1a}-1.79\%$
test_exec_td 0.2889ms 0.1397ms 7.1564 KOps/s 7.1360 KOps/s $\color{#35bf28}+0.29\%$
test_exec_td_decorator 1.0267ms 0.1946ms 5.1376 KOps/s 5.7289 KOps/s $\textbf{\color{#d91a1a}-10.32\%}$
test_vmap_mlp_speed[True-True] 1.2039ms 0.8666ms 1.1539 KOps/s 1.0829 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_vmap_mlp_speed[True-False] 0.7149ms 0.4596ms 2.1759 KOps/s 2.0797 KOps/s $\color{#35bf28}+4.62\%$
test_vmap_mlp_speed[False-True] 1.0863ms 0.7702ms 1.2984 KOps/s 1.2414 KOps/s $\color{#35bf28}+4.59\%$
test_vmap_mlp_speed[False-False] 0.5752ms 0.3783ms 2.6437 KOps/s 2.5240 KOps/s $\color{#35bf28}+4.74\%$
test_vmap_mlp_speed_decorator[True-True] 3.6061ms 2.2352ms 447.3899 Ops/s 422.2304 Ops/s $\textbf{\color{#35bf28}+5.96\%}$
test_vmap_mlp_speed_decorator[True-False] 0.9233ms 0.5267ms 1.8985 KOps/s 1.8687 KOps/s $\color{#35bf28}+1.60\%$
test_vmap_mlp_speed_decorator[False-True] 2.4796ms 1.8456ms 541.8239 Ops/s 514.3595 Ops/s $\textbf{\color{#35bf28}+5.34\%}$
test_vmap_mlp_speed_decorator[False-False] 0.9117ms 0.4105ms 2.4358 KOps/s 2.4140 KOps/s $\color{#35bf28}+0.90\%$

Copy link

github-actions bot commented Feb 1, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 132. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}33$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 68.7510μs 14.6794μs 68.1227 KOps/s 74.0146 KOps/s $\textbf{\color{#d91a1a}-7.96\%}$
test_plain_set_stack_nested 0.1466ms 0.1201ms 8.3233 KOps/s 8.3301 KOps/s $\color{#d91a1a}-0.08\%$
test_plain_set_nested_inplace 43.1310μs 15.8443μs 63.1142 KOps/s 67.6989 KOps/s $\textbf{\color{#d91a1a}-6.77\%}$
test_plain_set_stack_nested_inplace 0.1783ms 0.1495ms 6.6877 KOps/s 6.6897 KOps/s $\color{#d91a1a}-0.03\%$
test_items 29.1000μs 4.7599μs 210.0868 KOps/s 208.5577 KOps/s $\color{#35bf28}+0.73\%$
test_items_nested 0.3792ms 0.3395ms 2.9458 KOps/s 2.9054 KOps/s $\color{#35bf28}+1.39\%$
test_items_nested_locked 0.3847ms 0.3425ms 2.9199 KOps/s 2.8750 KOps/s $\color{#35bf28}+1.56\%$
test_items_nested_leaf 0.2306ms 0.2013ms 4.9676 KOps/s 4.9319 KOps/s $\color{#35bf28}+0.72\%$
test_items_stack_nested 1.4167ms 1.3197ms 757.7682 Ops/s 763.5290 Ops/s $\color{#d91a1a}-0.75\%$
test_items_stack_nested_leaf 1.1961ms 1.1592ms 862.6596 Ops/s 865.4692 Ops/s $\color{#d91a1a}-0.32\%$
test_items_stack_nested_locked 2.1810ms 0.8957ms 1.1164 KOps/s 1.1085 KOps/s $\color{#35bf28}+0.71\%$
test_keys 26.3910μs 4.6319μs 215.8955 KOps/s 218.7723 KOps/s $\color{#d91a1a}-1.31\%$
test_keys_nested 0.5068ms 94.8340μs 10.5447 KOps/s 10.4892 KOps/s $\color{#35bf28}+0.53\%$
test_keys_nested_locked 0.1208ms 98.3483μs 10.1679 KOps/s 10.1197 KOps/s $\color{#35bf28}+0.48\%$
test_keys_nested_leaf 0.1833ms 78.2014μs 12.7875 KOps/s 12.7144 KOps/s $\color{#35bf28}+0.57\%$
test_keys_stack_nested 1.1874ms 1.1488ms 870.4524 Ops/s 872.6155 Ops/s $\color{#d91a1a}-0.25\%$
test_keys_stack_nested_leaf 1.2859ms 1.1315ms 883.7637 Ops/s 874.0858 Ops/s $\color{#35bf28}+1.11\%$
test_keys_stack_nested_locked 0.7545ms 0.7232ms 1.3828 KOps/s 1.3722 KOps/s $\color{#35bf28}+0.77\%$
test_values 32.8800μs 1.9656μs 508.7597 KOps/s 528.8110 KOps/s $\color{#d91a1a}-3.79\%$
test_values_nested 68.7510μs 46.0456μs 21.7176 KOps/s 21.9943 KOps/s $\color{#d91a1a}-1.26\%$
test_values_nested_locked 81.3310μs 48.6267μs 20.5649 KOps/s 20.8375 KOps/s $\color{#d91a1a}-1.31\%$
test_values_nested_leaf 61.2410μs 40.3665μs 24.7730 KOps/s 25.1825 KOps/s $\color{#d91a1a}-1.63\%$
test_values_stack_nested 1.0253ms 0.9692ms 1.0318 KOps/s 1.0413 KOps/s $\color{#d91a1a}-0.91\%$
test_values_stack_nested_leaf 1.0272ms 0.9734ms 1.0274 KOps/s 1.0372 KOps/s $\color{#d91a1a}-0.95\%$
test_values_stack_nested_locked 0.7938ms 0.5771ms 1.7328 KOps/s 1.7335 KOps/s $\color{#d91a1a}-0.04\%$
test_membership 17.3300μs 1.0594μs 943.9208 KOps/s 1.0586 MOps/s $\textbf{\color{#d91a1a}-10.83\%}$
test_membership_nested 35.1800μs 2.9107μs 343.5594 KOps/s 341.3617 KOps/s $\color{#35bf28}+0.64\%$
test_membership_nested_leaf 17.5155μs 2.8146μs 355.2867 KOps/s 339.8264 KOps/s $\color{#35bf28}+4.55\%$
test_membership_stacked_nested 33.9900μs 11.5008μs 86.9505 KOps/s 87.5927 KOps/s $\color{#d91a1a}-0.73\%$
test_membership_stacked_nested_leaf 45.3110μs 11.4387μs 87.4229 KOps/s 87.3023 KOps/s $\color{#35bf28}+0.14\%$
test_membership_nested_last 28.3700μs 5.3170μs 188.0764 KOps/s 187.4194 KOps/s $\color{#35bf28}+0.35\%$
test_membership_nested_leaf_last 35.2210μs 5.2947μs 188.8691 KOps/s 186.8096 KOps/s $\color{#35bf28}+1.10\%$
test_membership_stacked_nested_last 0.1958ms 0.1570ms 6.3706 KOps/s 6.3511 KOps/s $\color{#35bf28}+0.31\%$
test_membership_stacked_nested_leaf_last 30.3610μs 13.3673μs 74.8096 KOps/s 75.2495 KOps/s $\color{#d91a1a}-0.58\%$
test_nested_getleaf 37.3500μs 8.4320μs 118.5959 KOps/s 118.2649 KOps/s $\color{#35bf28}+0.28\%$
test_nested_get 35.4900μs 7.9929μs 125.1103 KOps/s 124.9007 KOps/s $\color{#35bf28}+0.17\%$
test_stacked_getleaf 0.3776ms 0.3273ms 3.0549 KOps/s 2.9979 KOps/s $\color{#35bf28}+1.90\%$
test_stacked_get 0.3552ms 0.2957ms 3.3820 KOps/s 3.3579 KOps/s $\color{#35bf28}+0.72\%$
test_nested_getitemleaf 0.1017ms 9.7875μs 102.1708 KOps/s 101.3690 KOps/s $\color{#35bf28}+0.79\%$
test_nested_getitem 35.2300μs 9.3580μs 106.8601 KOps/s 106.8000 KOps/s $\color{#35bf28}+0.06\%$
test_stacked_getitemleaf 0.3925ms 0.3316ms 3.0157 KOps/s 2.9934 KOps/s $\color{#35bf28}+0.74\%$
test_stacked_getitem 0.3311ms 0.2986ms 3.3492 KOps/s 3.3288 KOps/s $\color{#35bf28}+0.61\%$
test_lock_nested 0.7818ms 0.3516ms 2.8441 KOps/s 2.8146 KOps/s $\color{#35bf28}+1.05\%$
test_lock_stack_nested 89.8403ms 6.3244ms 158.1185 Ops/s 157.9081 Ops/s $\color{#35bf28}+0.13\%$
test_unlock_nested 83.0054ms 0.4353ms 2.2975 KOps/s 2.8493 KOps/s $\textbf{\color{#d91a1a}-19.37\%}$
test_unlock_stack_nested 90.8536ms 6.4079ms 156.0582 Ops/s 154.1228 Ops/s $\color{#35bf28}+1.26\%$
test_flatten_speed 0.6379ms 0.2612ms 3.8279 KOps/s 3.8385 KOps/s $\color{#d91a1a}-0.28\%$
test_unflatten_speed 0.4009ms 0.3593ms 2.7835 KOps/s 2.7810 KOps/s $\color{#35bf28}+0.09\%$
test_common_ops 1.1160ms 0.6385ms 1.5661 KOps/s 1.6604 KOps/s $\textbf{\color{#d91a1a}-5.68\%}$
test_creation 14.8700μs 1.5619μs 640.2526 KOps/s 642.3523 KOps/s $\color{#d91a1a}-0.33\%$
test_creation_empty 40.8710μs 10.1301μs 98.7160 KOps/s 128.2796 KOps/s $\textbf{\color{#d91a1a}-23.05\%}$
test_creation_nested_1 53.5410μs 11.9746μs 83.5101 KOps/s 104.0056 KOps/s $\textbf{\color{#d91a1a}-19.71\%}$
test_creation_nested_2 40.5210μs 14.1762μs 70.5410 KOps/s 82.7075 KOps/s $\textbf{\color{#d91a1a}-14.71\%}$
test_clone 69.8210μs 13.8175μs 72.3721 KOps/s 72.4412 KOps/s $\color{#d91a1a}-0.10\%$
test_getitem[int] 35.2810μs 10.9188μs 91.5855 KOps/s 93.2101 KOps/s $\color{#d91a1a}-1.74\%$
test_getitem[slice_int] 41.5100μs 21.0006μs 47.6177 KOps/s 47.5650 KOps/s $\color{#35bf28}+0.11\%$
test_getitem[range] 66.3710μs 39.9388μs 25.0383 KOps/s 27.5605 KOps/s $\textbf{\color{#d91a1a}-9.15\%}$
test_getitem[tuple] 50.1210μs 18.3217μs 54.5802 KOps/s 54.1238 KOps/s $\color{#35bf28}+0.84\%$
test_getitem[list] 0.1618ms 33.6747μs 29.6959 KOps/s 30.1515 KOps/s $\color{#d91a1a}-1.51\%$
test_setitem_dim[int] 44.6000μs 29.1579μs 34.2960 KOps/s 37.8194 KOps/s $\textbf{\color{#d91a1a}-9.32\%}$
test_setitem_dim[slice_int] 69.8910μs 52.5099μs 19.0440 KOps/s 21.3235 KOps/s $\textbf{\color{#d91a1a}-10.69\%}$
test_setitem_dim[range] 0.1089ms 68.7098μs 14.5540 KOps/s 16.1710 KOps/s $\textbf{\color{#d91a1a}-10.00\%}$
test_setitem_dim[tuple] 65.4910μs 45.5965μs 21.9315 KOps/s 24.1285 KOps/s $\textbf{\color{#d91a1a}-9.11\%}$
test_setitem 69.3710μs 19.7774μs 50.5628 KOps/s 53.6084 KOps/s $\textbf{\color{#d91a1a}-5.68\%}$
test_set 70.2710μs 19.4307μs 51.4650 KOps/s 55.4414 KOps/s $\textbf{\color{#d91a1a}-7.17\%}$
test_set_shared 2.7215ms 0.1032ms 9.6943 KOps/s 9.1008 KOps/s $\textbf{\color{#35bf28}+6.52\%}$
test_update 0.1021ms 22.7314μs 43.9920 KOps/s 46.6745 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_update_nested 85.9610μs 29.3869μs 34.0287 KOps/s 37.5903 KOps/s $\textbf{\color{#d91a1a}-9.47\%}$
test_set_nested 66.6720μs 20.7933μs 48.0925 KOps/s 51.2033 KOps/s $\textbf{\color{#d91a1a}-6.08\%}$
test_set_nested_new 66.3710μs 23.6192μs 42.3384 KOps/s 46.5258 KOps/s $\textbf{\color{#d91a1a}-9.00\%}$
test_select 80.0110μs 36.6352μs 27.2961 KOps/s 28.8371 KOps/s $\textbf{\color{#d91a1a}-5.34\%}$
test_select_nested 74.2410μs 53.4297μs 18.7162 KOps/s 18.6645 KOps/s $\color{#35bf28}+0.28\%$
test_exclude_nested 0.1428ms 0.1130ms 8.8475 KOps/s 8.7369 KOps/s $\color{#35bf28}+1.27\%$
test_empty[True] 0.4602ms 0.3855ms 2.5937 KOps/s 2.5931 KOps/s $\color{#35bf28}+0.02\%$
test_empty[False] 2.9381μs 0.8453μs 1.1830 MOps/s 1.1877 MOps/s $\color{#d91a1a}-0.39\%$
test_to 75.3120μs 53.4148μs 18.7214 KOps/s 18.4315 KOps/s $\color{#35bf28}+1.57\%$
test_to_nonblocking 63.5210μs 34.1142μs 29.3133 KOps/s 27.6305 KOps/s $\textbf{\color{#35bf28}+6.09\%}$
test_unbind_speed 0.3013ms 0.2717ms 3.6803 KOps/s 3.7530 KOps/s $\color{#d91a1a}-1.94\%$
test_unbind_speed_stack0 94.3737ms 3.5471ms 281.9168 Ops/s 299.4509 Ops/s $\textbf{\color{#d91a1a}-5.86\%}$
test_unbind_speed_stack1 20.8300μs 1.8000μs 555.5488 KOps/s 539.5189 KOps/s $\color{#35bf28}+2.97\%$
test_split 83.5582ms 1.7538ms 570.2063 Ops/s 573.5687 Ops/s $\color{#d91a1a}-0.59\%$
test_chunk 81.5004ms 1.6797ms 595.3418 Ops/s 647.1929 Ops/s $\textbf{\color{#d91a1a}-8.01\%}$
test_creation[device0] 0.1431ms 73.8585μs 13.5394 KOps/s 13.7760 KOps/s $\color{#d91a1a}-1.72\%$
test_creation_from_tensor 0.2196ms 55.0413μs 18.1682 KOps/s 17.5258 KOps/s $\color{#35bf28}+3.67\%$
test_add_one[memmap_tensor0] 0.2075ms 6.7699μs 147.7128 KOps/s 142.4896 KOps/s $\color{#35bf28}+3.67\%$
test_contiguous[memmap_tensor0] 25.7810μs 0.6499μs 1.5387 MOps/s 1.5399 MOps/s $\color{#d91a1a}-0.08\%$
test_stack[memmap_tensor0] 39.6510μs 4.3700μs 228.8323 KOps/s 224.0512 KOps/s $\color{#35bf28}+2.13\%$
test_memmaptd_index 1.0594ms 0.2719ms 3.6774 KOps/s 3.6767 KOps/s $\color{#35bf28}+0.02\%$
test_memmaptd_index_astensor 0.6678ms 0.3288ms 3.0417 KOps/s 2.7779 KOps/s $\textbf{\color{#35bf28}+9.50\%}$
test_memmaptd_index_op 0.9883ms 0.6577ms 1.5204 KOps/s 1.6090 KOps/s $\textbf{\color{#d91a1a}-5.51\%}$
test_serialize_model 0.1771s 98.8210ms 10.1193 Ops/s 10.7745 Ops/s $\textbf{\color{#d91a1a}-6.08\%}$
test_serialize_model_pickle 1.3656s 1.2395s 0.8068 Ops/s 0.8055 Ops/s $\color{#35bf28}+0.16\%$
test_serialize_weights 0.1742s 96.1056ms 10.4052 Ops/s 10.8510 Ops/s $\color{#d91a1a}-4.11\%$
test_serialize_weights_returnearly 0.2728s 73.0155ms 13.6957 Ops/s 17.7203 Ops/s $\textbf{\color{#d91a1a}-22.71\%}$
test_serialize_weights_pickle 1.3471s 1.2361s 0.8090 Ops/s 0.8077 Ops/s $\color{#35bf28}+0.16\%$
test_reshape_pytree 58.4010μs 24.9728μs 40.0436 KOps/s 39.6046 KOps/s $\color{#35bf28}+1.11\%$
test_reshape_td 50.9810μs 29.7212μs 33.6460 KOps/s 33.7929 KOps/s $\color{#d91a1a}-0.43\%$
test_view_pytree 45.4510μs 24.2750μs 41.1946 KOps/s 40.8761 KOps/s $\color{#35bf28}+0.78\%$
test_view_td 88.3480ms 10.0045μs 99.9551 KOps/s 96.3496 KOps/s $\color{#35bf28}+3.74\%$
test_unbind_pytree 0.3092ms 30.2707μs 33.0353 KOps/s 32.6806 KOps/s $\color{#35bf28}+1.09\%$
test_unbind_td 0.2008ms 41.9356μs 23.8461 KOps/s 23.8166 KOps/s $\color{#35bf28}+0.12\%$
test_split_pytree 0.1640ms 28.6899μs 34.8555 KOps/s 33.9114 KOps/s $\color{#35bf28}+2.78\%$
test_split_td 0.1061ms 38.9243μs 25.6909 KOps/s 25.6266 KOps/s $\color{#35bf28}+0.25\%$
test_add_pytree 74.6310μs 36.0752μs 27.7199 KOps/s 27.6283 KOps/s $\color{#35bf28}+0.33\%$
test_add_td 83.8810μs 52.3550μs 19.1004 KOps/s 21.2262 KOps/s $\textbf{\color{#d91a1a}-10.02\%}$
test_distributed 0.1834ms 74.2731μs 13.4638 KOps/s 14.4416 KOps/s $\textbf{\color{#d91a1a}-6.77\%}$
test_tdmodule 98.2610μs 19.2608μs 51.9190 KOps/s 57.2286 KOps/s $\textbf{\color{#d91a1a}-9.28\%}$
test_tdmodule_dispatch 0.1945ms 38.9061μs 25.7029 KOps/s 27.6072 KOps/s $\textbf{\color{#d91a1a}-6.90\%}$
test_tdseq 44.6300μs 21.7716μs 45.9313 KOps/s 48.5779 KOps/s $\textbf{\color{#d91a1a}-5.45\%}$
test_tdseq_dispatch 64.3300μs 41.1637μs 24.2933 KOps/s 25.7886 KOps/s $\textbf{\color{#d91a1a}-5.80\%}$
test_instantiation_functorch 2.0137ms 1.7048ms 586.5706 Ops/s 583.9651 Ops/s $\color{#35bf28}+0.45\%$
test_instantiation_td 1.7002ms 1.1763ms 850.1077 Ops/s 843.3426 Ops/s $\color{#35bf28}+0.80\%$
test_exec_functorch 0.2752ms 0.1633ms 6.1242 KOps/s 6.2004 KOps/s $\color{#d91a1a}-1.23\%$
test_exec_functional_call 0.2338ms 0.1620ms 6.1718 KOps/s 6.1319 KOps/s $\color{#35bf28}+0.65\%$
test_exec_td 0.1807ms 0.1519ms 6.5838 KOps/s 6.4694 KOps/s $\color{#35bf28}+1.77\%$
test_exec_td_decorator 0.1151s 0.2420ms 4.1320 KOps/s 5.2501 KOps/s $\textbf{\color{#d91a1a}-21.30\%}$
test_vmap_mlp_speed[True-True] 1.4157ms 1.0558ms 947.1175 Ops/s 966.1696 Ops/s $\color{#d91a1a}-1.97\%$
test_vmap_mlp_speed[True-False] 0.7588ms 0.6102ms 1.6388 KOps/s 1.6669 KOps/s $\color{#d91a1a}-1.69\%$
test_vmap_mlp_speed[False-True] 1.1112ms 0.9714ms 1.0294 KOps/s 988.7017 Ops/s $\color{#35bf28}+4.12\%$
test_vmap_mlp_speed[False-False] 0.6817ms 0.5395ms 1.8536 KOps/s 1.8032 KOps/s $\color{#35bf28}+2.79\%$
test_vmap_mlp_speed_decorator[True-True] 2.8979ms 2.3681ms 422.2788 Ops/s 428.8563 Ops/s $\color{#d91a1a}-1.53\%$
test_vmap_mlp_speed_decorator[True-False] 1.1063ms 0.6733ms 1.4852 KOps/s 1.5464 KOps/s $\color{#d91a1a}-3.96\%$
test_vmap_mlp_speed_decorator[False-True] 2.3900ms 1.9889ms 502.7803 Ops/s 515.0222 Ops/s $\color{#d91a1a}-2.38\%$
test_vmap_mlp_speed_decorator[False-False] 0.9490ms 0.5649ms 1.7703 KOps/s 1.7875 KOps/s $\color{#d91a1a}-0.96\%$
test_vmap_transformer_speed[True-True] 13.9329ms 12.7010ms 78.7341 Ops/s 81.8845 Ops/s $\color{#d91a1a}-3.85\%$
test_vmap_transformer_speed[True-False] 8.6482ms 8.2227ms 121.6151 Ops/s 119.2852 Ops/s $\color{#35bf28}+1.95\%$
test_vmap_transformer_speed[False-True] 12.8197ms 12.3397ms 81.0391 Ops/s 80.4566 Ops/s $\color{#35bf28}+0.72\%$
test_vmap_transformer_speed[False-False] 8.5614ms 8.1387ms 122.8701 Ops/s 121.8608 Ops/s $\color{#35bf28}+0.83\%$
test_vmap_transformer_speed_decorator[True-True] 76.6408ms 75.2118ms 13.2958 Ops/s 13.7961 Ops/s $\color{#d91a1a}-3.63\%$
test_vmap_transformer_speed_decorator[True-False] 21.7157ms 20.1268ms 49.6850 Ops/s 51.4997 Ops/s $\color{#d91a1a}-3.52\%$
test_vmap_transformer_speed_decorator[False-True] 68.8579ms 67.7725ms 14.7553 Ops/s 15.3198 Ops/s $\color{#d91a1a}-3.69\%$
test_vmap_transformer_speed_decorator[False-False] 0.1592s 22.3965ms 44.6497 Ops/s 52.5147 Ops/s $\textbf{\color{#d91a1a}-14.98\%}$

@vmoens vmoens added the bug Something isn't working label Feb 1, 2024
@vmoens vmoens merged commit cde67c3 into main Feb 2, 2024
17 of 27 checks passed
@vmoens vmoens deleted the fix-load-state-dict branch February 2, 2024 15:58
vmoens added a commit that referenced this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Loading losses with modules that have no parameters
2 participants