Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

👷🐧🍎🏁 Extensive CI testing #60

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

burgholzer
Copy link
Member

This PR experiments with broadening the type of runners that we run CI on. Due to the amount of runs that this creates, I am rather hesitant to directly merge this. Hence, I will keep this open for now and keep the branch itself. This way, we can test the real-world consequences of this and decide then.

In detail, this PR extends CI to run on:

  • 🐧 ubuntu-22.04,
  • 🐧 ubuntu-22.04-arm,
  • 🐧 ubuntu-24.04,
  • 🐧 ubuntu-24.04-arm,
  • 🍎 macos-13,
  • 🍎 macos-14,
  • 🍎 macos-15,
  • 🏁 windows-2022,
  • 🏁 windows-2025,

@burgholzer burgholzer self-assigned this Jan 16, 2025
@burgholzer burgholzer added c++ Anything related to C++ code continuous integration Anything related to the CI setup feature New feature or request python Anything related to Python code labels Jan 16, 2025
@marcelwa
Copy link

@burgholzer I think this can prove a valuable addition to the workflow package. Particularly with respect to our recent discussions on a CI unification across @cda-tum.

@burgholzer
Copy link
Member Author

@burgholzer I think this can prove a valuable addition to the workflow package. Particularly with respect to our recent discussions on a CI unification across @cda-tum.

Good point. The build matrix just scales extremely quickly. Given how most of our packages support all currently non-EOL Python versions, the current setup with this PR as it is results in:

  • 5 Python versions
  • 9 operating system (versions)
  • 2 sessions (minimum versions and regular resolution)

on the Python side and

  • 9 operating system versions
  • 2 build types (Debug and Release)

on the C++ side.

In total that means at least 90 + 18 individual runs; out of which (5 * 3 * 2) + (3 * 2) = 36 are run on some macOS system. Given how the concurrency limit on macOS runners is 5, this would imply that each CI run would exclusively block the macOS runners for 7 full builds and tests.

Each of these runs has to build the C++ part of a project. The Python runs have to additionally install all Python dependencies. Then, all of these runs have to run the respective tests.

Setting all the above aside for a moment, we could go one step further in the sense that we could also run on different compilers for each operating system (similar to how fiction handles this). I could imagine it would make sense to test at least

  • default versions of clang and gcc available for Ubuntu and macOS
  • msvc and Clang on Windows

Technically, this applies to both the Python and the C++ builds. So even with the above, it would amount to another factor of 2x in the number of runs.
At some point it seems unreasonable to run ~250 individual builds per CI run.
@marcelwa any kind of ideas for how we could manage this in a better fashion?
I could imagine that a clever setup of the CI in conjunction with careful setup of the individual projects and their test suites could really help here.
And, naturally, it would help if the MQT were a monorepo like fiction.

@marcelwa
Copy link

In total that means at least 90 + 18 individual runs; out of which (5 * 3 * 2) + (3 * 2) = 36 are run on some macOS system. Given how the concurrency limit on macOS runners is 5, this would imply that each CI run would exclusively block the macOS runners for 7 full builds and tests.
[...]
Technically, this applies to both the Python and the C++ builds. So even with the above, it would amount to another factor of 2x in the number of runs. At some point it seems unreasonable to run ~250 individual builds per CI run. @marcelwa any kind of ideas for how we could manage this in a better fashion? I could imagine that a clever setup of the CI in conjunction with careful setup of the individual projects and their test suites could really help here. And, naturally, it would help if the MQT were a monorepo like fiction.

I see that this setup quickly grows beyond reason. Naturally, there might not be an immediate end-all-be-all solution to a complex topic like this one. Let me try to list some thoughts on the matter anyway (in no particular order):

  • In the MNT, we haven't faced any compiler version-specific bugs in a long time. Very occasionally, we still see diverging behavior between g++ and clang for instance (and of course between operating systems) but not in a while has there been a bug that occurred on g++ version X but not on X±1.
    ➡️ A careful setup might only consider a single (runner default?) version per compiler.
  • On macOS, the default compiler is AppleClang. You will have to go out of your way to install g++. On Apple Silicon Macs, you must pass additional compiler flags for compatibility (-arch arm64). As far as I know, Intel Macs still haven't reached end-of-life but might within a year or so.
    ➡️ It could be reasonable to restrict the CI to AppleClang on macOS.
  • Lots of recompilation overhead is already caught via compiler caches. However, we could maximize reuse by uploading entire (cleaned) build directories as artifacts and fetching them in subsequent runs. This will probably have limited to no effect when building wheels due to technical constraints.
    ➡️ We might want to investigate build directory caching between runs.
  • Some projects might not need the entire list of 5 Python versions, 9 operating system versions, and 2 sessions.
    ➡️ Flags to the workflow to toggle certain versions/sessions could reduce redundant workload.
  • In the MNT, each workflow tests first whether certain files (file types) have changed in the current PR before deciding whether to run at all. I think you're incorporating similar checks here.
    ➡️ Let's ponder whether there is any way of making such checks stricter.
  • Self-hosted runners would go a long way in reducing the wait time for other runs.
    ➡️ We might want to install a first self-hosted runner as a test balloon rather soon.

I'm always open to discussing this situation further.

@burgholzer
Copy link
Member Author

Thanks for your thoughts! 🙏🏼 I'll just add mine in a similar fashion.

I agree that one particular compiler version should be sufficient per platform and compiler.
Testing clang and gcc on linux might make sense, though.
Only using the default compiler on macOS seems fine to me.
I do see value in testing both MSVC and clang under Windows as we have seen some compilation problems with MSVC in the past that required us to use clang for compilation. Making sure this does not happen again would be nice.

I am still a bit on the fence when it comes to caching. First of all, from experience, it is rather opaque and at times hard to judge whether the compiler caches really work the way they are intended to work. Especially on the Python side, it can be quite hard to get that to work properly on all operating systems. Additionally, these caches have to be restored and saved in each run, which also takes quite some time because this has to go through the network. Given how some of the caches are more than a couple of hundred megabytes large, this definitely takes non-zero time.
Caching whole build directories seems rather fragile to me. I can remember a couple of cases where subtle differences in the runners tripped up the CI when reusing build folders.
It might even be faster to not cache the compilation at all.
Any thoughts on this?
(afterthought: we are also way overboard with our cache usage on GitHub in basically all of our projects.. and the allowance for that is 10 GiB 🙃)

In general, there is a conflict of interest that we will probably never be able to resolve:
On the one hand, one would want jobs that are as fine-grained as possible so that they can be parallelized as much as possible. That creates many, but small, jobs. On the other hand, we could severely reduce the number of individual jobs by packing everything into a single workflow with lots of options. Then the total number of runs will be smaller, but each individual run will take considerably longer.
Combining everything also has the disadvantage that you potentially have to wait a rather long time until the particular thing you are currently working on is tested.

What could really make a lot of sense is to allow more customization of the individual workflows so that it can be configured on-demand which jobs to run (either automatically based on files that changed, like we already to it at the moment) or via explicit opt-outs (meaning someone would modify the CI.yml in their PR to only enable certain checks they want to run).
This kind of configuration would create quite some duplication of code in the workflows though, since one would not be able to use build matrices for the most part (they need to be statically defined if I remember correctly).

Self-hosted runners would really go a long way. Especially macOS ones.

Added inputs for specifying runner images, compilers, and configurations across Ubuntu, macOS, and Windows workflows. Simplified matrix generation with dynamic configuration, allowing more flexibility in build environments and testing setups.

Signed-off-by: burgholzer <[email protected]>
The workflow is now expected to be called separately like the `reusable-cpp-linter` workflow.

Signed-off-by: burgholzer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Anything related to C++ code continuous integration Anything related to the CI setup feature New feature or request python Anything related to Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants