Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo-audit: move to a style check #10432

Closed
wants to merge 1 commit into from

Conversation

nagisa
Copy link
Collaborator

@nagisa nagisa commented Jan 15, 2024

NOT INTENDED TO MERGE, BUT CHECK OUT THE DISCUSSION ABOUT JUSTFILE STARTING WITH THE NEXT COMMENT :)

@nagisa nagisa requested a review from a team as a code owner January 15, 2024 22:46
@nagisa nagisa requested a review from Longarithm January 15, 2024 22:46
@nagisa
Copy link
Collaborator Author

nagisa commented Jan 15, 2024

Hmph. I now mildly remember that I myself might have complained about this needing to be non-blocking check… This PR kinda goes against that complaint?

Copy link

codecov bot commented Jan 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a0386b1) 72.01% compared to head (f27356c) 72.00%.
Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10432      +/-   ##
==========================================
- Coverage   72.01%   72.00%   -0.02%     
==========================================
  Files         720      720              
  Lines      144871   144877       +6     
  Branches   144871   144877       +6     
==========================================
- Hits       104326   104314      -12     
- Misses      35758    35783      +25     
+ Partials     4787     4780       -7     
Flag Coverage Δ
backward-compatibility 0.08% <ø> (ø)
db-migration 0.08% <ø> (ø)
genesis-check 1.26% <ø> (ø)
integration-tests 36.76% <0.00%> (+0.02%) ⬆️
linux 71.52% <100.00%> (-0.03%) ⬇️
linux-nightly 71.58% <100.00%> (+<0.01%) ⬆️
macos 53.77% <100.00%> (+<0.01%) ⬆️
pytests 1.48% <ø> (ø)
sanity-checks 1.27% <ø> (ø)
unittests 68.11% <100.00%> (-0.01%) ⬇️
upgradability 0.13% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Longarithm
Copy link
Member

cargo audit

Summoning security specialist :)

@Longarithm Longarithm removed their request for review January 16, 2024 00:42
@Ekleog-NEAR
Copy link
Collaborator

So a first argument against this is, we definitely do not want a failing cargo audit to block landing PRs (if a RUSTSEC advisory was published), because things can start failing in a way that’s unrelated to the PR code, and that’d be very bad dev experience.

A more fundamental argument is, I’m actually thinking of PRing a removal of all the style tests we currently have stashed in the cargo nextest job, to migrate them all out into separate steps with an associated justfile command.

The reason is simple: after chatting with @frol, he mentioned to me that waiting 20 minutes to have CI reject based on a formatting error is enough to make a contributor give up about their PR. And I actually agree with him, and it actually happened to me too more than once, because I don’t like commit hooks that automatically change my code, and sometimes forget doing it manually. Failing early would both save me some time, and save us quite a bit of money by having the nextest jobs be cancelled at the beginning rather than at the end.

In addition to this, moving the style tests out makes them more obvious to understand from the PR submitter’s point of view by giving them a proper name, and I think we should optimize for the PR submitter rather than for the person who edits CI.

This being said, it looks like cargo audit is currently not being executed by the test-ci Justfile target, which is definitely an oversight. And actually it looks like it’s not even the only one, eg. all the python tests seem to not be run by this target either.

@Ekleog-NEAR
Copy link
Collaborator

Ekleog-NEAR commented Jan 16, 2024

Oh I missed your point about "running in parallel with all the other things". I’d argue that the 6 extra seconds of very lightweight runner are negligible compared to (the rest of our CI and) the benefits of it being obvious for the PR submitter which test is failing :)

(The job takes 2 minutes, but they are time taken to download cargo-audit from the cache, and only 6 seconds would be parallelizable, based on the latest master run at the time of my comment. Actually, this plus running the binary three times and the fact that the added download time are on a much larger runner make me think that we’d actually end up paying a lot more after the switch — 47 seconds at 4x the cost, and 3x the number of runs, mean we’d switch from 2 minutes to ~9.5 minutes in money equivalent)

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 16, 2024

This being said, it looks like cargo audit is currently not being executed by the test-ci Justfile target, which is definitely an oversight. And actually it looks like it’s not even the only one, eg. all the python tests seem to not be run by this target either.

I don’t mind moving everything out of the style tests, but test-ci is quite painful on my end tbh due to the fact that it a) runs everything on sequentially and b) tests overwrite target/ of each other so every run takes long. The latter is easyish to fix using the same strategy with multiple targets as in style tests, but I don’t see how to solve a) with a Justfile.

@nagisa nagisa closed this Jan 16, 2024
@Ekleog-NEAR
Copy link
Collaborator

One option for A would be to add a test-ci-parallel that’d just run all the just dependencies in parallel. This would be less concurrency-controlled than nextest, which consistently runs n_cores jobs, but anyway these jobs each mostly use all the cores anyway due to running inside cargo or rustc. WDYT?

(TBH, I basically never used test-ci locally myself, due to the sheer time it takes, I prefer to round-trip to github PR CI and context-switch, so I can’t really speak to how usable it is 🤷)

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 16, 2024

just run all the just dependencies in parallel

Can you do that, even? I did some spelunking around and I don’t think I was able to find a way for this.

never used test-ci locally myself, due to the sheer time it takes

A part of this is due to the fact that the two large test suites it runs (nextest stable and nextest nightly) aren’t run in parallel, another part is because the two nextests share the same target directory so one ends up rebuilding half of repository twice on every single invocation.

I never saw much point in fixing the latter when the former is impossible (according to my research,) which is why I don’t pursue it much. But yeah it is barely bearable even on my epyc.

these jobs each mostly use all the cores anyway due to running inside cargo or rustc

We have some very significant long tails in our test suite that are I/O bound (or sequential within themselves) – the estimator test, the lychee test, perhaps some others I’m not thinking of right now. Ideally we'd be able to setup a posix jobserver here, but if just does not have parallelism, jobserver is even more unlikely to even be considered as an option here.

@Ekleog-NEAR
Copy link
Collaborator

Can you do that, even? I did some spelunking around and I don’t think I was able to find a way for this.

You can recursively run just, it’s not recommended but we don’t have any complex enough usage of it for it to be problematic. Then it becomes a bit of bash to run all the sub-just commands in parallel :)

A part of this is due to the fact that the two large test suites it runs (nextest stable and nextest nightly) aren’t run in parallel,

I mean they each take all of my cores for more than a few minutes anyway, so running them in parallel would probably just make things slower 😅 especially AFAIR most of the time spent in the estimator is rebuilding nearcore with different compilation flags, so there’s still lots to improve on here

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 16, 2024

Then it becomes a bit of bash to run all the sub-just commands in parallel :)

What’s the point of using just anymore at that point if its gonna be ugly bash all over the place?

  • We would no longer get static checking of well-formedness (however much of it just does,) which one of the larger benefits as perceived by Just’s own documentation;
  • We would have to fiddle with bash’s job system to get parallel execution (yuck); and
  • We still end up having to figure the jobserver problem for ourselves (this is the part that would make the multiple invocations of cargo not step on each other’s toes.)

Wouldn’t at that point a plain gnu makefile start making a lot of sense, despite all its faults…? We could also look at alternative implementations of a make-like build systems, but gnu make does have a significant benefit of being available out of the box on many linux systems at least.

(What I’m going to do is re-open this PR only to act as a home for this discussion, others might have a better time finding and chiming in if its not burried in the list of closed PRs :))

@nagisa nagisa reopened this Jan 16, 2024
@Ekleog-NEAR
Copy link
Collaborator

What’s the point of using just anymore at that point if its gonna be ugly bash all over the place?

Heh 😅

In a way make does have the -j option, but OTOH I do think that the make syntax is the worst thing humanity invented (at least). I’m curious though, did you check you’d actually get a significant speed boost by parallelizing this stuff? I only have 8 cores on my dev VM so I’m pretty sure I wouldn’t (maybe 10-20%, if cache thrashing doesn’t wind up losing more than parallelization gains?), but maybe your epyc buildserver would actually be able to run multiple stuffs in parallel?

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 17, 2024

Well, won’t say no to doing some tests :)

I made sure things like sccache and such are disabled and added the following to the Makefile in our repo:

# export DOCKER_BUILDKIT = 1
# export CARGO_BUILD_RUSTFLAGS = -D warnings
# export NEAR_RELEASE_BUILD = no
# export CARGO_TARGET_DIR = target
test-ci: a b c d e f g h j
a:
	just check-cargo-fmt

b:
	just python-style-checks
c:
	just check-cargo-deny

d:
	just check-themis

e:
	just check-cargo-clippy

f:
	just check-non-default

g:
	just check-cargo-udeps

h:
	just nextest "nightly"
	just nextest "stable"

Note that I had to very begrudgingly keep just nextest invocations here serial1 and my observation that specifically during this part the utilization was pretty low. Our network tests don’t take kindly to being run in parallel in a single network namespace T_T

$ rm -rf target/

$ command time make test-ci -j
<...>
22484.99user 1857.13system 10:56.79elapsed 3706%CPU (0avgtext+0avgdata 1823708maxresident)k
429730837inputs+59015587outputs (1319602major+300273344minor)pagefaults 0swaps
$ rm -rf target/
$ command time just test-ci
20382.95user 1784.89system 13:07.09elapsed 2816%CPU (0avgtext+0avgdata 1837064maxresident)k
429704007inputs+59015614outputs (1312486major+299503718minor)pagefaults 0swaps

Here are also some grafana graphs of CPU utilization (NB: y axis log scale). The yellow line at the top is "idle", green is "user", purple is "system". For make -j you can see the parallel part finishing about 3 minutes into the ordeal:

graph of the cpu time per second showing consistently 100% utilization during the first 3 minutes-or-so and then variable utilization afterwards

Same for regular just test-ci:

graph of the CPU time per second showing somewhat variable utilization


Anyhow, clearly I‘m the one standing to benefit from this the most, I just need to convince everybody that justfile is a wall between me and my one-watercooler-roundtrip iteration times :)

Footnotes

  1. For what it is worth, though, there is a solution to this, the tests can still very much be built in parallel with cargo nextest archive and the test suite itself takes 1m 45s (quite bearable even if two runs of this are serialized).

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 17, 2024

Here's perhaps a slightly more straightforward half-comparison:

test-ci: a b c d e f g h j
a:
	just check-cargo-fmt

b:
	just python-style-checks
c:
	just check-cargo-deny

d:
	just check-themis

e:
	just check-cargo-clippy

f:
	just check-non-default

g:
	just check-cargo-udeps

h:
	just nextest-unit "stable"

$ rm -rf target/
$ command time make test-ci -j
17338.19user 1434.73system 5:56.74elapsed 5262%CPU (0avgtext+0avgdata 1828136maxresident)k
295278983inputs+52653831outputs (975026major+242263770minor)pagefaults 0swaps
$ rm -rf target/
$ command time make test-ci
15317.79user 1339.15system 8:02.97elapsed 3448%CPU (0avgtext+0avgdata 1839776maxresident)k
295278987inputs+52653846outputs (973431major+241349752minor)pagefaults 0swaps

That said, perhaps the real solution here is to somehow integrate sccache into our justfile so at least builds just go fast regardless given that parallelizing tests would be pretty darn hard™

@Ekleog-NEAR
Copy link
Collaborator

Do I read correctly your time command that you get 2 minutes lower wall clock time, in exchange for 2000 seconds more CPU time consumed? If so, for laymen like us who don’t have The Big Build Machine, these 2000 seconds on an 8core machine would be 4 minutes lost :p

That said I think I totally agree with your idea of integrating sccache at least! As for parallelization, there seem to be quite a lot of ideas on this issue upstream, but for our specific use case of interactively running tests I’m thinking the tmux-based idea might be one of the best? :)

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 17, 2024

I’m approximately confident that in my specific case this will have something to do with frequency scaling down closer to base clocks due to power reasons primarily. These chips are, like, 225W at base clocks full tilt after all (and when not limited boost from 2.3GHz to 3.3GHz and more.)


As for parallelization, there seem to be quite a lot of ideas on casey/just#626, but for our specific use case of interactively running tests I’m thinking casey/just#626 (comment) might be one of the best? :)

One problem that we should solve here sccache notwithstanding is a jobserver support. That’s precisely the part that avoids having all of the cargo/rustc processes threads and processes compete against each other for all the cores available on the system and also enables you or me as the user to control parallelism. For example if, given the example above, I change the rules as such:

a:
	env CARGO_MAKEFLAGS="${MAKEFLAGS}" just check-cargo-fmt

invoking make -j8 will roughly limit execution to just 8 cores (I see about 10 running processes when eyeballing htop)

@Ekleog-NEAR
Copy link
Collaborator

One option suggested on the just issue is pueue, which looks like it’d be able to do the make jobserver but better, by allowing the user to get a clean feed of the output of any specific running command :)

That said, considering there are currently 8 targets, AFAICT make -j 8 will just run them all in parallel without caring the least, so I’m thinking it’d probably make sense to just use the tmux solution to have an even easier access to the logs of each individual command?

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 17, 2024

I think there's may be a misunderstanding of what jobserver does. It implements global parallelism control, rather than a local one by passing around file descriptors to a shared semaphore. Cargo and rustc both support make’s jobserver implementation which means that once you run make -j8, no matter how many cores you have, and no matter how many concurrent cargo processes there are all of the rustc invocations will coordinate between themselves (via that shared semaphore) to ensure there will be roughly a limit of 8 running threads at any given time.

make -j$NUM_VCORES is an interesting case here in particular, because in that exact scenario the many different cargo processes avoid spawning $WAY_TOO_MANY rustcs each of which then proceed to spawn further $NCORES codegen threads, leading system to an unfortunate state.

For this to work unsurprisingly the invoked processes need to know how to work with the semaphore and to the best of my knowledge make’s jobserver is the only implementation currently. So there can’t really be anything better than make’s jobserver per se, as anything else just won’t do anything at all. (Except maybe if we count cgroups or virtualization)


Output interleaving is another problem altogether requiring its own solutions, whether it be pueue or tmux or just writing outputs to a file and printing them out later. For those yes.

@Ekleog-NEAR
Copy link
Collaborator

Oooh I had never once expected that cargo and rustc would have supported make’s jobserver, I expected them to be basically like gcc. Thank you for the clarification!

@Ekleog-NEAR Ekleog-NEAR marked this pull request as draft January 23, 2024 14:09
@nagisa
Copy link
Collaborator Author

nagisa commented Jan 29, 2024

So I have set up sccache but unfortunately it looks like the improvement by incorporating it is not as significant as I had hoped (cargo build from scratch now takes about a minute instead of 1m45s or thereabouts.) I would imagine that a lot of time ends up being spent linking the many test binaries and such, so I maintain my position that ideally we should have the test suite run in parallel by the entry point tooling in the first place.

This also raises a question if the combination of these two tools wouldn't allow us to have a fast GHA CI that just runs e.g. make test-all on a single very large runner. I would imagine that chances are there this could even be faster overall (in at least runner time) due to not spending the extra setup and overhead time for each individual job we have right now.

@Ekleog-NEAR
Copy link
Collaborator

I would imagine that a lot of time ends up being spent linking the many test binaries and such, so I maintain my position that ideally we should have the test suite run in parallel by the entry point tooling in the first place.

Do I understand correctly that this is about local development? If yes, I’m entirely with you on that front.

This also raises a question if the combination of these two tools wouldn't allow us to have a fast GHA CI that just runs e.g. make test-all on a single very large runner. I would imagine that chances are there this could even be faster overall (in at least runner time) due to not spending the extra setup and overhead time for each individual job we have right now.

I’m a lot less hyped by this idea. For one main reason: it’s currently easy to know which tests pass and which tests fail, by virtue of the checklist showing as multiple green/yellow/red dots with a descriptive name. We’d lose that and force people to always look at the detailed logs and scroll through all the stuff, to figure out which check failed. I think the only case where it makes sense, would be for the linux and linux-nightly tests, but that’s also the reason why I wanted to remove the non-nightly tests altogether by removing the compile-time switch.

I’d also expect that, while the big GHA CI runner could indeed make the whole test suite complete faster, it’d also probably end up more expensive, because each non-full-utilization minute would be significantly more expensive. Meaning that our "longest tail" test would become a big problem, especially if we’re unlucky and it gets started last. Also, a lot of our current tests are actually completely free, because we’re still within the 50000 free minutes of basic runners — actually Bowen asked me whether we could push more stuff to the basic runners for that exact reason. This being said, I’m not entirely sure about that point (we’d need to test and benchmark), and it’s a secondary item compared to the main reason.

Does that make sense?

@Ekleog-NEAR
Copy link
Collaborator

(cargo build from scratch now takes about a minute instead of 1m45s or thereabouts.)

I tried it out in the CI of a personal project, and it was a significantly better improvement, with performance basically on par with swatinem’s cache. Sure my personal project is way smaller than nearcore and I’m only using the basic runners there, but I’m thinking it might still make sense to try it out in nearcore’s CI, to see if at least the cache manages to get used after that?

@nagisa
Copy link
Collaborator Author

nagisa commented Jan 29, 2024

I tried it out in the CI of a personal project, and it was a significantly better improvement, with performance basically on par with swatinem’s cache. Sure my personal project is way smaller than nearcore and I’m only using the basic runners there, but I’m thinking it might still make sense to try it out in nearcore’s CI, to see if at least the cache manages to get used after that?

#10445 demonstrates that at the very least we'll need to set up a gcloud bucket for this. GH cache based solution doesn't work for projects with the scale of nearcore.

@nagisa nagisa closed this Feb 15, 2024
@Ekleog-NEAR Ekleog-NEAR deleted the nagisa/moves-cargo-audit-to-a-style-check branch March 29, 2024 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants