👷🐧🍎🏁 Extensive CI testing #60

burgholzer · 2025-01-16T20:39:26Z

This PR experiments with broadening the type of runners that we run CI on. Due to the amount of runs that this creates, I am rather hesitant to directly merge this. Hence, I will keep this open for now and keep the branch itself. This way, we can test the real-world consequences of this and decide then.

In detail, this PR extends CI to run on:

🐧 ubuntu-22.04,
🐧 ubuntu-22.04-arm,
🐧 ubuntu-24.04,
🐧 ubuntu-24.04-arm,
🍎 macos-13,
🍎 macos-14,
🍎 macos-15,
🏁 windows-2022,
🏁 windows-2025,

Signed-off-by: burgholzer <[email protected]>

marcelwa · 2025-01-17T10:10:22Z

@burgholzer I think this can prove a valuable addition to the workflow package. Particularly with respect to our recent discussions on a CI unification across @cda-tum.

burgholzer · 2025-01-17T11:40:08Z

@burgholzer I think this can prove a valuable addition to the workflow package. Particularly with respect to our recent discussions on a CI unification across @cda-tum.

Good point. The build matrix just scales extremely quickly. Given how most of our packages support all currently non-EOL Python versions, the current setup with this PR as it is results in:

5 Python versions
9 operating system (versions)
2 sessions (minimum versions and regular resolution)

on the Python side and

9 operating system versions
2 build types (Debug and Release)

on the C++ side.

In total that means at least 90 + 18 individual runs; out of which (5 * 3 * 2) + (3 * 2) = 36 are run on some macOS system. Given how the concurrency limit on macOS runners is 5, this would imply that each CI run would exclusively block the macOS runners for 7 full builds and tests.

Each of these runs has to build the C++ part of a project. The Python runs have to additionally install all Python dependencies. Then, all of these runs have to run the respective tests.

Setting all the above aside for a moment, we could go one step further in the sense that we could also run on different compilers for each operating system (similar to how fiction handles this). I could imagine it would make sense to test at least

default versions of clang and gcc available for Ubuntu and macOS
msvc and Clang on Windows

Technically, this applies to both the Python and the C++ builds. So even with the above, it would amount to another factor of 2x in the number of runs.
At some point it seems unreasonable to run ~250 individual builds per CI run.
@marcelwa any kind of ideas for how we could manage this in a better fashion?
I could imagine that a clever setup of the CI in conjunction with careful setup of the individual projects and their test suites could really help here.
And, naturally, it would help if the MQT were a monorepo like fiction.

marcelwa · 2025-01-17T12:49:22Z

In total that means at least 90 + 18 individual runs; out of which (5 * 3 * 2) + (3 * 2) = 36 are run on some macOS system. Given how the concurrency limit on macOS runners is 5, this would imply that each CI run would exclusively block the macOS runners for 7 full builds and tests.
[...]
Technically, this applies to both the Python and the C++ builds. So even with the above, it would amount to another factor of 2x in the number of runs. At some point it seems unreasonable to run ~250 individual builds per CI run. @marcelwa any kind of ideas for how we could manage this in a better fashion? I could imagine that a clever setup of the CI in conjunction with careful setup of the individual projects and their test suites could really help here. And, naturally, it would help if the MQT were a monorepo like fiction.

I see that this setup quickly grows beyond reason. Naturally, there might not be an immediate end-all-be-all solution to a complex topic like this one. Let me try to list some thoughts on the matter anyway (in no particular order):

In the MNT, we haven't faced any compiler version-specific bugs in a long time. Very occasionally, we still see diverging behavior between g++ and clang for instance (and of course between operating systems) but not in a while has there been a bug that occurred on g++ version X but not on X±1.
➡️ A careful setup might only consider a single (runner default?) version per compiler.
On macOS, the default compiler is AppleClang. You will have to go out of your way to install g++. On Apple Silicon Macs, you must pass additional compiler flags for compatibility (-arch arm64). As far as I know, Intel Macs still haven't reached end-of-life but might within a year or so.
➡️ It could be reasonable to restrict the CI to AppleClang on macOS.
Lots of recompilation overhead is already caught via compiler caches. However, we could maximize reuse by uploading entire (cleaned) build directories as artifacts and fetching them in subsequent runs. This will probably have limited to no effect when building wheels due to technical constraints.
➡️ We might want to investigate build directory caching between runs.
Some projects might not need the entire list of 5 Python versions, 9 operating system versions, and 2 sessions.
➡️ Flags to the workflow to toggle certain versions/sessions could reduce redundant workload.
In the MNT, each workflow tests first whether certain files (file types) have changed in the current PR before deciding whether to run at all. I think you're incorporating similar checks here.
➡️ Let's ponder whether there is any way of making such checks stricter.
Self-hosted runners would go a long way in reducing the wait time for other runs.
➡️ We might want to install a first self-hosted runner as a test balloon rather soon.

I'm always open to discussing this situation further.

burgholzer · 2025-01-17T14:13:29Z

Thanks for your thoughts! 🙏🏼 I'll just add mine in a similar fashion.

I agree that one particular compiler version should be sufficient per platform and compiler.
Testing clang and gcc on linux might make sense, though.
Only using the default compiler on macOS seems fine to me.
I do see value in testing both MSVC and clang under Windows as we have seen some compilation problems with MSVC in the past that required us to use clang for compilation. Making sure this does not happen again would be nice.

I am still a bit on the fence when it comes to caching. First of all, from experience, it is rather opaque and at times hard to judge whether the compiler caches really work the way they are intended to work. Especially on the Python side, it can be quite hard to get that to work properly on all operating systems. Additionally, these caches have to be restored and saved in each run, which also takes quite some time because this has to go through the network. Given how some of the caches are more than a couple of hundred megabytes large, this definitely takes non-zero time.
Caching whole build directories seems rather fragile to me. I can remember a couple of cases where subtle differences in the runners tripped up the CI when reusing build folders.
It might even be faster to not cache the compilation at all.
Any thoughts on this?
(afterthought: we are also way overboard with our cache usage on GitHub in basically all of our projects.. and the allowance for that is 10 GiB 🙃)

In general, there is a conflict of interest that we will probably never be able to resolve:
On the one hand, one would want jobs that are as fine-grained as possible so that they can be parallelized as much as possible. That creates many, but small, jobs. On the other hand, we could severely reduce the number of individual jobs by packing everything into a single workflow with lots of options. Then the total number of runs will be smaller, but each individual run will take considerably longer.
Combining everything also has the disadvantage that you potentially have to wait a rather long time until the particular thing you are currently working on is tested.

What could really make a lot of sense is to allow more customization of the individual workflows so that it can be configured on-demand which jobs to run (either automatically based on files that changed, like we already to it at the moment) or via explicit opt-outs (meaning someone would modify the CI.yml in their PR to only enable certain checks they want to run).
This kind of configuration would create quite some duplication of code in the workflows though, since one would not be able to use build matrices for the most part (they need to be statically defined if I remember correctly).

Self-hosted runners would really go a long way. Especially macOS ones.

Signed-off-by: burgholzer <[email protected]>

Added inputs for specifying runner images, compilers, and configurations across Ubuntu, macOS, and Windows workflows. Simplified matrix generation with dynamic configuration, allowing more flexibility in build environments and testing setups. Signed-off-by: burgholzer <[email protected]>

The workflow is now expected to be called separately like the `reusable-cpp-linter` workflow. Signed-off-by: burgholzer <[email protected]>

Signed-off-by: burgholzer <[email protected]>

burgholzer added 2 commits January 16, 2025 21:05

👷🐧🍎🏁 add C++ runs on additional Ubuntu, macOS, and Windows runners

8c0e58e

Signed-off-by: burgholzer <[email protected]>

👷🐧🍎🏁🐍 add Python runs on additional Ubuntu, macOS, and Windows runners

b816d95

Signed-off-by: burgholzer <[email protected]>

burgholzer self-assigned this Jan 16, 2025

burgholzer added c++ Anything related to C++ code continuous integration Anything related to the CI setup feature New feature or request python Anything related to Python code labels Jan 16, 2025

burgholzer mentioned this pull request Jan 16, 2025

⚗️ try extensive CI cda-tum/mqt-core#803

Draft

4 tasks

burgholzer added 6 commits January 17, 2025 16:04

🔧 use uvx in favor of explicitly setting up nox

913907f

Signed-off-by: burgholzer <[email protected]>

⬆️ update default Z3 version to 4.13.4

65e6e35

Signed-off-by: burgholzer <[email protected]>

🔥 remove caching from Windows as it is rarely functional

5658480

Signed-off-by: burgholzer <[email protected]>

🔥 move coverage out of reusable-cpp-ci

3707264

The workflow is now expected to be called separately like the `reusable-cpp-linter` workflow. Signed-off-by: burgholzer <[email protected]>

⚗️ try to fix dynamic matrix generation

baea273

Signed-off-by: burgholzer <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

👷🐧🍎🏁 Extensive CI testing #60

👷🐧🍎🏁 Extensive CI testing #60

burgholzer commented Jan 16, 2025

marcelwa commented Jan 17, 2025

burgholzer commented Jan 17, 2025

marcelwa commented Jan 17, 2025

burgholzer commented Jan 17, 2025

👷🐧🍎🏁 Extensive CI testing #60

Are you sure you want to change the base?

👷🐧🍎🏁 Extensive CI testing #60

Conversation

burgholzer commented Jan 16, 2025

marcelwa commented Jan 17, 2025

burgholzer commented Jan 17, 2025

marcelwa commented Jan 17, 2025

burgholzer commented Jan 17, 2025