Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI has started taking six+ hours #196

Closed
inducer opened this issue Mar 30, 2023 · 12 comments
Closed

CI has started taking six+ hours #196

inducer opened this issue Mar 30, 2023 · 12 comments

Comments

@inducer
Copy link
Owner

inducer commented Mar 30, 2023

Recently observed on bock:

----------------------------------------------- generated xml file: /var/lib/gitlab-runner/builds/zCL2egrE/0/inducer/pytential/test/pytest.xml ------------------------------------------------
==================================================================================== slowest 10 durations =====================================================================================
17940.36s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case3]
3550.70s call     test/test_stokes.py::test_exterior_stokes[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-3]
1737.94s call     test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator5-solution5]
1114.27s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
1094.16s call     test/test_layer_pot_identity.py::test_identity_convergence_slow[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
1025.42s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case2]
821.09s call     test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case9]
605.78s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
574.41s call     test/test_layer_pot_eigenvalues.py::test_ellipse_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-2-5-3-False]
469.54s call     test/test_matrix.py::test_build_matrix[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-vector-curve_fn1-42]
=================================================================================== short test summary info ===================================================================================
SKIPPED [1] test_linalg_proxy.py:200: 3d partitioning requires a tree
=========================================================== 2 failed, 233 passed, 1 skipped, 23213 warnings in 22429.75s (6:13:49) ============================================================
appdirs==1.4.4
arraycontext @ git+https://github.com/inducer/arraycontext.git@6f9616f2bfc09f0c6e4c915712427569f3a1f854
attrs==22.2.0
boxtree @ git+https://github.com/inducer/boxtree.git@1dff7111f400b23fdafc9963c0f24a009be23349
cgen==2020.1
codepy==2019.1
colorama==0.4.6
Cython==0.29.33
execnet==1.9.0
genpy==2022.1
gmsh-interop==2021.1.1
immutables==0.19
iniconfig==2.0.0
islpy @ git+https://github.com/inducer/islpy.git@a824f28113693978a06600911df3a7c36fd67f17
loopy @ git+https://github.com/inducer/loopy.git@c590001873dc8bba3374bf54435538245825d375
Mako==1.2.4
MarkupSafe==2.1.2
meshmode @ git+https://github.com/inducer/meshmode.git@ff0f5f9eaeed38b5bf1c20aa24f2e2d2e05f438c
modepy @ git+https://github.com/inducer/modepy.git@15a06582922d2aa026e2706859cea7d05cd0aa0a
mpmath==1.3.0
numpy==1.24.2
packaging==23.0
platformdirs==3.2.0
pluggy==1.0.0
psutil==5.9.4
pybind11==2.10.4
pyfmmlib @ git+https://github.com/inducer/pyfmmlib.git@e7bb3d18c58bc72ff00361b9093716c09368b726
pymbolic @ git+https://github.com/inducer/pymbolic.git@88f205bf98bdee7d89e193e208d147837cb08f1c
pyopencl @ git+https://github.com/inducer/pyopencl.git@95ad30e2d4ec8a1ed31f1f16b9efd94829c8f89b
pyrsistent==0.19.3
-e git+https://gitlab-ci-token:[email protected]/inducer/pytential.git@b63b97965a1a2ef56155dcdc46d0db9fb36e6a24#egg=pytential
pytest==7.2.2
pytest-github-actions-annotate-failures==0.1.8
pytest-xdist==3.2.1
pytools @ git+https://github.com/inducer/pytools.git@56efa1b3b6dbeea414904880efc8f1d7e4fcb8c0
pyvkfft==2023.1.1
recursivenodes==0.2.0
scipy==1.10.1
six==1.16.0
sumpy @ git+https://github.com/inducer/sumpy.git@fa24fef1af53268077cbfeda69c2545330535631
sympy==1.11.1
pip freeze from that run on bock

cc @alexfikl because his test is a winner, accounting for five of those hours

cc @isuruf because we discussed this on Monday

@inducer
Copy link
Owner Author

inducer commented Mar 30, 2023

A similar issue may affect meshmode: https://gitlab.tiker.net/inducer/pytato/-/jobs/538649

@inducer
Copy link
Owner Author

inducer commented Mar 30, 2023

Before this started happening:

============================= slowest 10 durations =============================
958.88s call     test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case9]
552.57s call     test/test_layer_pot_eigenvalues.py::test_ellipse_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-1-7-5-False]
452.24s call     test/test_layer_pot_identity.py::test_identity_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case0]
444.78s call     test/test_cost_model.py::test_cost_model_correctness[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-2-True-False]
444.33s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case0]
424.14s call     test/test_layer_pot_identity.py::test_identity_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case1]
363.69s call     test/test_layer_pot.py::test_off_surface_eval[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-True]
351.86s call     test/test_layer_pot_eigenvalues.py::test_sphere_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-sumpy-2-3-3]
246.16s call     test/test_layer_pot_eigenvalues.py::test_ellipse_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-2-7-5-False]
229.97s call     test/test_cost_model.py::test_cost_model_correctness[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-3-False-False]
=========================== short test summary info ============================
SKIPPED [1] test_linalg_proxy.py:200: 3d partitioning requires a tree
========= 228 passed, 1 skipped, 18489 warnings in 2603.72s (0:43:23) ==========

@inducer
Copy link
Owner Author

inducer commented Mar 30, 2023

Those pip freezes are literally identical.

@inducer
Copy link
Owner Author

inducer commented Mar 30, 2023

Recent runs of #195 are affected on Github, too.

@inducer
Copy link
Owner Author

inducer commented Mar 30, 2023

Found an old build environment (from March 22, on koelsch, in /var/lib/gitlab-runner/builds/0d8732fb/0/inducer/pytential/test). By date, it should be from before this all started going poorly, according to https://gitlab.tiker.net/inducer/pytential/-/pipelines. Now, unfortunately, the only difference in pip freeze is platformdirs going from 3.1.1 to 3.2.0, which I'm not sure is relevant.

@inducer
Copy link
Owner Author

inducer commented Mar 30, 2023

Whatever it is, it's affecting both Conda and bare-venv runs: https://gitlab.tiker.net/inducer/pytential/-/pipelines/409699

@alexfikl
Copy link
Collaborator

alexfikl commented Mar 30, 2023

Hm, just ran the test_linalg_skeletonization test locally and it seems to be doing just fine.

I'm a bit confused by the -case3 at the end of that though, since the test has 4 cases it runs with and 2 are marked as slow:

@pytest.mark.parametrize("case", [
CONVERGENCE_TEST_CASES[0],
CONVERGENCE_TEST_CASES[1],
pytest.param(CONVERGENCE_TEST_CASES[2], marks=pytest.mark.slowtest),
pytest.param(CONVERGENCE_TEST_CASES[3], marks=pytest.mark.slowtest),
])
def test_skeletonize_by_proxy_convergence(

Is the CI running slow tests too all of a sudden?

EDIT: Take some of that back, I also ran it with -m 'not slowtest'. Running -case3 seems to also take a whole lot, but it doesn't explain why it's on the CI to begin with.

Used py-spy top to check out where it is and it seems to be stuck in np.svd when computing the errors. I'm guessing the matrices are just too large..

@inducer
Copy link
Owner Author

inducer commented Mar 30, 2023

Still, there might be something to your theory: The slow runs show 236 tests ("2 failed, 233 passed, 1 skipped"), whereas the manageable-time ones show 229 ("228 passed, 1 skipped").

@isuruf
Copy link
Collaborator

isuruf commented Mar 30, 2023

Slow runs are running only slowtests because of

PYTEST_ADDOPTS: -kslowtest

This used to be,

PYTEST_ADDOPTS: -m 'not slowtest'

Do you know who might be setting that env variable?

@alexfikl
Copy link
Collaborator

alexfikl commented Mar 30, 2023

Do you know who might be setting that env variable?

You're right! Seems to be back to normal on https://gitlab.tiker.net/inducer/pytential/-/jobs/539024.
Not sure what's going on there..?

@isuruf
Copy link
Collaborator

isuruf commented Mar 31, 2023

Recent runs of #195 are affected on Github, too.

#195 is an unrelated issue.

@inducer
Copy link
Owner Author

inducer commented Apr 5, 2023

I'm really not sure what happened here, but 🤷 we may not find out. I'll take the mysterious recovery and say we're done here.

Thanks everyone!

@inducer inducer closed this as completed Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants