-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cargo-audit: move to a style check #10432
Conversation
Hmph. I now mildly remember that I myself might have complained about this needing to be non-blocking check… This PR kinda goes against that complaint? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #10432 +/- ##
==========================================
- Coverage 72.01% 72.00% -0.02%
==========================================
Files 720 720
Lines 144871 144877 +6
Branches 144871 144877 +6
==========================================
- Hits 104326 104314 -12
- Misses 35758 35783 +25
+ Partials 4787 4780 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Summoning security specialist :) |
So a first argument against this is, we definitely do not want a failing A more fundamental argument is, I’m actually thinking of PRing a removal of all the style tests we currently have stashed in the The reason is simple: after chatting with @frol, he mentioned to me that waiting 20 minutes to have CI reject based on a formatting error is enough to make a contributor give up about their PR. And I actually agree with him, and it actually happened to me too more than once, because I don’t like commit hooks that automatically change my code, and sometimes forget doing it manually. Failing early would both save me some time, and save us quite a bit of money by having the nextest jobs be cancelled at the beginning rather than at the end. In addition to this, moving the style tests out makes them more obvious to understand from the PR submitter’s point of view by giving them a proper name, and I think we should optimize for the PR submitter rather than for the person who edits CI. This being said, it looks like cargo audit is currently not being executed by the |
Oh I missed your point about "running in parallel with all the other things". I’d argue that the 6 extra seconds of very lightweight runner are negligible compared to (the rest of our CI and) the benefits of it being obvious for the PR submitter which test is failing :) (The job takes 2 minutes, but they are time taken to download |
I don’t mind moving everything out of the style tests, but |
One option for A would be to add a (TBH, I basically never used test-ci locally myself, due to the sheer time it takes, I prefer to round-trip to github PR CI and context-switch, so I can’t really speak to how usable it is 🤷) |
Can you do that, even? I did some spelunking around and I don’t think I was able to find a way for this.
A part of this is due to the fact that the two large test suites it runs (nextest stable and nextest nightly) aren’t run in parallel, another part is because the two nextests share the same target directory so one ends up rebuilding half of repository twice on every single invocation. I never saw much point in fixing the latter when the former is impossible (according to my research,) which is why I don’t pursue it much. But yeah it is barely bearable even on my epyc.
We have some very significant long tails in our test suite that are I/O bound (or sequential within themselves) – the estimator test, the lychee test, perhaps some others I’m not thinking of right now. Ideally we'd be able to setup a posix jobserver here, but if just does not have parallelism, jobserver is even more unlikely to even be considered as an option here. |
You can recursively run
I mean they each take all of my cores for more than a few minutes anyway, so running them in parallel would probably just make things slower 😅 especially AFAIR most of the time spent in the estimator is rebuilding nearcore with different compilation flags, so there’s still lots to improve on here |
What’s the point of using
Wouldn’t at that point a plain gnu makefile start making a lot of sense, despite all its faults…? We could also look at alternative implementations of a make-like build systems, but gnu make does have a significant benefit of being available out of the box on many linux systems at least. (What I’m going to do is re-open this PR only to act as a home for this discussion, others might have a better time finding and chiming in if its not burried in the list of closed PRs :)) |
Heh 😅 In a way |
Well, won’t say no to doing some tests :) I made sure things like sccache and such are disabled and added the following to the Makefile in our repo: # export DOCKER_BUILDKIT = 1
# export CARGO_BUILD_RUSTFLAGS = -D warnings
# export NEAR_RELEASE_BUILD = no
# export CARGO_TARGET_DIR = target
test-ci: a b c d e f g h j
a:
just check-cargo-fmt
b:
just python-style-checks
c:
just check-cargo-deny
d:
just check-themis
e:
just check-cargo-clippy
f:
just check-non-default
g:
just check-cargo-udeps
h:
just nextest "nightly"
just nextest "stable" Note that I had to very begrudgingly keep $ rm -rf target/
$ command time make test-ci -j
<...>
22484.99user 1857.13system 10:56.79elapsed 3706%CPU (0avgtext+0avgdata 1823708maxresident)k
429730837inputs+59015587outputs (1319602major+300273344minor)pagefaults 0swaps
$ rm -rf target/
$ command time just test-ci
20382.95user 1784.89system 13:07.09elapsed 2816%CPU (0avgtext+0avgdata 1837064maxresident)k
429704007inputs+59015614outputs (1312486major+299503718minor)pagefaults 0swaps Here are also some grafana graphs of CPU utilization (NB: y axis log scale). The yellow line at the top is "idle", green is "user", purple is "system". For Same for regular Anyhow, clearly I‘m the one standing to benefit from this the most, I just need to convince everybody that justfile is a wall between me and my one-watercooler-roundtrip iteration times :) Footnotes
|
Here's perhaps a slightly more straightforward half-comparison: test-ci: a b c d e f g h j
a:
just check-cargo-fmt
b:
just python-style-checks
c:
just check-cargo-deny
d:
just check-themis
e:
just check-cargo-clippy
f:
just check-non-default
g:
just check-cargo-udeps
h:
just nextest-unit "stable" $ rm -rf target/
$ command time make test-ci -j
17338.19user 1434.73system 5:56.74elapsed 5262%CPU (0avgtext+0avgdata 1828136maxresident)k
295278983inputs+52653831outputs (975026major+242263770minor)pagefaults 0swaps
$ rm -rf target/
$ command time make test-ci
15317.79user 1339.15system 8:02.97elapsed 3448%CPU (0avgtext+0avgdata 1839776maxresident)k
295278987inputs+52653846outputs (973431major+241349752minor)pagefaults 0swaps That said, perhaps the real solution here is to somehow integrate sccache into our justfile so at least builds just go fast regardless given that parallelizing tests would be pretty darn hard™ |
Do I read correctly your That said I think I totally agree with your idea of integrating sccache at least! As for parallelization, there seem to be quite a lot of ideas on this issue upstream, but for our specific use case of interactively running tests I’m thinking the tmux-based idea might be one of the best? :) |
I’m approximately confident that in my specific case this will have something to do with frequency scaling down closer to base clocks due to power reasons primarily. These chips are, like, 225W at base clocks full tilt after all (and when not limited boost from 2.3GHz to 3.3GHz and more.)
One problem that we should solve here a:
env CARGO_MAKEFLAGS="${MAKEFLAGS}" just check-cargo-fmt invoking |
One option suggested on the just issue is pueue, which looks like it’d be able to do the make jobserver but better, by allowing the user to get a clean feed of the output of any specific running command :) That said, considering there are currently 8 targets, AFAICT |
I think there's may be a misunderstanding of what jobserver does. It implements global parallelism control, rather than a local one by passing around file descriptors to a shared semaphore. Cargo and rustc both support make’s jobserver implementation which means that once you run
For this to work unsurprisingly the invoked processes need to know how to work with the semaphore and to the best of my knowledge make’s jobserver is the only implementation currently. So there can’t really be anything better than make’s jobserver per se, as anything else just won’t do anything at all. (Except maybe if we count cgroups or virtualization) Output interleaving is another problem altogether requiring its own solutions, whether it be |
Oooh I had never once expected that cargo and rustc would have supported make’s jobserver, I expected them to be basically like gcc. Thank you for the clarification! |
So I have set up sccache but unfortunately it looks like the improvement by incorporating it is not as significant as I had hoped ( This also raises a question if the combination of these two tools wouldn't allow us to have a fast GHA CI that just runs e.g. |
Do I understand correctly that this is about local development? If yes, I’m entirely with you on that front.
I’m a lot less hyped by this idea. For one main reason: it’s currently easy to know which tests pass and which tests fail, by virtue of the checklist showing as multiple green/yellow/red dots with a descriptive name. We’d lose that and force people to always look at the detailed logs and scroll through all the stuff, to figure out which check failed. I think the only case where it makes sense, would be for the linux and linux-nightly tests, but that’s also the reason why I wanted to remove the non-nightly tests altogether by removing the compile-time switch. I’d also expect that, while the big GHA CI runner could indeed make the whole test suite complete faster, it’d also probably end up more expensive, because each non-full-utilization minute would be significantly more expensive. Meaning that our "longest tail" test would become a big problem, especially if we’re unlucky and it gets started last. Also, a lot of our current tests are actually completely free, because we’re still within the 50000 free minutes of basic runners — actually Bowen asked me whether we could push more stuff to the basic runners for that exact reason. This being said, I’m not entirely sure about that point (we’d need to test and benchmark), and it’s a secondary item compared to the main reason. Does that make sense? |
I tried it out in the CI of a personal project, and it was a significantly better improvement, with performance basically on par with swatinem’s cache. Sure my personal project is way smaller than nearcore and I’m only using the basic runners there, but I’m thinking it might still make sense to try it out in nearcore’s CI, to see if at least the cache manages to get used after that? |
#10445 demonstrates that at the very least we'll need to set up a gcloud bucket for this. GH cache based solution doesn't work for projects with the scale of nearcore. |
NOT INTENDED TO MERGE, BUT CHECK OUT THE DISCUSSION ABOUT JUSTFILE STARTING WITH THE NEXT COMMENT :)