Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presentation of cargo::error encourages build scripts to exit with a successful status code on failure #15038

Open
kornelski opened this issue Jan 9, 2025 · 4 comments
Labels
A-build-scripts Area: build.rs scripts C-bug Category: bug S-triage Status: This issue is waiting on initial triage.

Comments

@kornelski
Copy link
Contributor

kornelski commented Jan 9, 2025

Problem

The current implementation of cargo::error has a minimal, concise error presentation only when the build script exits with a successful status code (0) when it reports a build failure.

If a build script fails with a non-zero status code, use of the cargo::error directive doesn't prevent Cargo from adding a full dump of stdout/stderr after it, which makes the output noisy and difficult to understand #10159.

The current implementation incentivises build scripts to exit with the status 0 (success) to cleanly report fatal build errors. This is a non-standard behavior for reporting failures, and it's not backwards compatible with older Cargo versions that don't understand cargo::error.

Dumping the full stdout/stderr output is a major problem that diminishes usefulness of cargo::error. The dump is so noisy that the errors reported via cargo::error may be less visible and less readable than a raw text printed to stderr. The stdout dump can contain long lists of cargo::rerun-if-(env-)changed directives, and other potentially irrelevant logs from subcommands. A long noisy stdout/stderr dump usually causes users to stop reading the output carefully, and gloss over the entire output instead. It may even push the "error:" lines out of the view, leaving only the raw dump on screen, which ends with less helpful content. The pkg-config and cmake crates have this problem, and they also happen to be the most frequent source of build script failures. pkg-config prints 21 cargo::rerun-if-env-changed lines, which together with Cargo's boilerplate is 25 lines, the same as the default height of cmd.exe console on Windows. Windows users may literally not see the errors generated by cargo::error if a build script fails with a failure status code. The long output dump can be made even noisier when the build script panics, and Cargo actively encourages users to enable even longer backtraces, which pushes the "error:" lines even further, and obscures even the last lines of stderr that may contain another copy of the error.

I think the conditions for when cargo::error= has a clean presentation should be swapped: it should give the cleanest presentation when the build script reports cargo::error and exits with a failure status code to match it. This is a consistent response (edit: or cargo::error could have a clean output regardless of the status code).

(edit: Optional improvement suggestion, not the main problem): when a build script prints cargo::error, but exits with a success code, this is a self-contradictory result. This is likely a bug in the build script. Such failures could be caused by helper functions or build-time dependencies eagerly printing cargo::error, even when the build script manages to recover from the error and succeeds using some other way. To help catch such rogue cargo::errors that break successful build scripts, Cargo should be printing the full stdout dump, so that build script authors can check where in the log the unintended directive has appeared.

In other words, this is the current behavior:

no structured errors cargo::error=
exit code: success success (clean output) error (terse output)
exit code: failure dump all output error + dump all output

but to make cargo::error useful to end users, and (edit: optionally) bad error reporting debuggable, it should work like this:

no structured errors cargo::error=
exit code: success success (clean output) error + dump all output, or error (clean output)
exit code: failure dump all output error (clean output)

Steps

Buggy build.rs:

use std::process::ExitCode;

fn main() -> ExitCode {
    println!("cargo::error=oops");
    ExitCode::SUCCESS
}

happens to get a clean error output with a high signal-to-noise ratio:

error: [email protected]: oops
error: build script logged errors

A more typical script that exits with a correct status code:

fn main() {
    println!("cargo::error=oops");
    panic!("oops")
}

gets an ugly noisy dump that makes it hard for end users to understand what's going on:

error: [email protected]: oops
error: failed to run custom build command for `buildexample v0.1.0 (/path/to/buildexample)`
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
  process didn't exit successfully: `/path/to/buildexample/target/debug/build/buildexample-0e711ca19eba66e9/build-script-build` (exit status: 101)
  --- stdout
  cargo::error=oops

  --- stderr
  thread 'main' panicked at build.rs:3:5:
  oops
  stack backtrace:
     0: rust_begin_unwind
               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
     1: core::panicking::panic_fmt
               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
     2: build_script_build::main
     3: core::ops::function::FnOnce::call_once
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Possible Solution(s)

build scripts that report both failure and success at the same time (one via cargo:: the other via status) should get the stdout dump to help find the source of the contradictory response.

build scripts that print cargo::error= and correctly return failure code should get a clean error output with just the errors and warnings the script reported, without irrelevant information dumped by Cargo.

Version

cargo 1.86.0-nightly (fd784878c 2025-01-03)
@kornelski kornelski added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Jan 9, 2025
@epage epage added the A-build-scripts Area: build.rs scripts label Jan 9, 2025
@epage
Copy link
Contributor

epage commented Jan 9, 2025

I see there being 3 exit conditions with cargo::error

  • exit(0)
  • exit(1) (or any non-zero value)
  • panic!()

I would assume the exhaustive output would be most helpful in the case of a logic bug which would usually come in the form of a panic!(). Unfortunately, we can't distinguish those from exit(1). You mentioned trying to identify those in #10159 but I don't see a path forward that I would be comfortable with.

I don't see how the extra output would be relevant to help with an "error" with the exit(0) case and it seems like it would only serve to annoy the users into "correct" behavior. While I'm generally good with reducing and adding friction to drive specific behavior, this feels too artificial and I'm not sure if it deserves such disincentivization.

For the concern for the exit(1) case, we never really did explore the hiding of directives behind --verbose from #10159 which could be an alternative.

@kornelski
Copy link
Contributor Author

I don't see how the extra output would be relevant to help with an "error" with the exit(0) case

I do not think that developers would print cargo::error= and then exit the script with a success status code on purpose. I just don't see any motivation for doing that (other than this UI issue of course). It isn't much easier or better, or even lazier solution. It's possible to exit main with just a return, but panic!() isn't much harder to type. Where someone would call std::process::exit(0), it doesn't cost them anything to call std::process::exit(1) instead.

Exiting with an error status on failure is the correct thing to do, but this issue is not about beating non-compliant people with the best practices. It's the opposite: an observation that Rust users already value correctness, so they would find it objectionable to be forced to use incorrect and backwards-incompatible exit(0) to get better error reporting. Existing build scripts already use the right status codes, so nobody needs to be forced on that path. The compliance is already so good that non-compliance can be easily interpreted as a bug that developers would appreciate to have pointed out.

As much as I'd like to make pull requests to all the -sys crates to make them use the clean error presentation, I expect I'd have hard time convincing crate owners to accept a PR that uses exit(0) for printing errors.


Debug dump on cargo::error+exit(0) helps to detect issues like this:

fn probe_library(name: &str) -> bool {
   match search_for_it(name) {
       Ok(found) => {
          println!("cargo::rustc-link-lib={found}");
          return true;
       },
       Err(error) => {
          println!("cargo::error={error}");
          return false;
       }
    }
}

fn main() {
    if !probe_library("dep") {
        build_static().unwrap();
    }
}

This script prints cargo::error, even when it means to report a success (when build_static succeeds). Printing of cargo:: directives as a side effect on success is normal, and generally fine, because nobody probes a library to not use it. But printing of cargo::error is a bug in this situation. I imagine somebody could make such mistake, because printing something on both success and failure has a nice symmetry to it. Alternatively, such printing code could have been left in even after a helper function has been refactored from panicking/exiting to fallible. There are big and hairy build scripts that probe a bunch of libraries, so I can easily imagine logic bugs or copypaste bugs in helper functions, especially that failures of build scripts tend to be platform- and environment-specific, and "works on my machine" for the authors, so may be under-tested.


panic! can be detected by status code 101. Build scripts have no reason to set this code otherwise, so false positive rate on this will be effectively zero (and for users who really must use that exit status, exit(0) isn't suitable either).

Build scripts very often intentionally .unwrap() and panic! on expected run-time failures caused by the environment, so treating them as bugs would clash with the established practice (which is different from proper library Rust code).

However, the exit(n) vs panic! distinction is a red herring. The core issue is that a good presentation of cargo::error= doesn't work in well-behaved backwards-compatible build scripts. Whether they use panic! or Err to bubble the error, the issue remains the same.

If you think that I'm proposing to add the stdout/stderr dump on cargo::error + exit(0) to annoy users and add friction, then I hope you understand why I'm so annoyed Cargo already does this in the cargo::error + exit(1) case.

@epage
Copy link
Contributor

epage commented Jan 13, 2025

I do not think that developers would print cargo::error= and then exit the script with a success status code on purpose. I

This is the defined behavior of the API as agreed by the Cargo team. Deviating from that would need buy-in from the Cargo team, documentation updates, and messages informing users they aren't using the feature correctly.

I'm not seeing enough reason to re-litigate the discussion from #10159.

Debug dump on cargo::error+exit(0) helps to detect issues like this:

How does printing the debug dump help find the problem?

panic! can be detected by status code 101. Build scripts have no reason to set this code otherwise, so false positive rate on this will be effectively zero (and for users who really must use that exit status, exit(0) isn't suitable either).

I hadn't paid attention to what code is used. Cargo happens to return this code, so not fully outside the realm of possibility but we could document it as having special behavior.

Build scripts very often intentionally .unwrap() and panic! on expected run-time failures caused by the environment, so treating them as bugs would clash with the established practice (which is different from proper library Rust code).

The problem is we can't distinguish intentional panics from unintentional panics.

However, the exit(n) vs panic! distinction is a red herring.

Please do not outright dismiss others like this. You were discussing when debug information is needed. I was doing an analysis of the different cases where debug information is helpful.

@kornelski
Copy link
Contributor Author

kornelski commented Jan 14, 2025

How does printing the debug dump help find the problem?

Through printf-debugging. A build script is likely to log things it's been doing (and/or its subcommands like cmake or configure will), so the location where the unwanted cargo:error= appears in the log may point to the source of it. If you see stdout dump with "probing library xxx; cargo::error=No such file or directory (os error 2); trying fallback for library xxx" then you know that the library probe created the unwanted side effect.

But I'm realizing that I'm trying to report a bug for one case, and suggest a nice-to-have for another case, and discussing these two things at once just makes the discussion more complicated. So to simplify, I'm not insisting on printing the stdout/stderr raw dump for cargo::error + exit(0) case, that's a nice-to-have. The root problem is that cargo::error + exit(1) doesn't suppress the raw stdout/stderr dump, and that case was meant to be the normal correct way to use this feature. Suppression of the raw dump was the motivation for cargo::error to exist. My point is not to relitigate whether exit(0) should be allowed, but report that cargo::error is unusably buggy when exit(1) is used.

This is the defined behavior of the API as agreed by the Cargo team.

IIRC the decision was that cargo::error should fail the build regardless of the status code returned by the script.

But there must have been some miscommunication or a mistake in the implementation, because in the shipped implementation the status is not just irrelevant, but the script must report success status to get the nicer cargo::error presentation. The clean display of cargo::error is not supported for scripts that report a failure code.

So the status didn't just become irrelevant. It became very relevant, and exit(0) became the only way to report failure that supports the intended cargo::error presentation.

I need to stress that the whole point of cargo::error was to give scripts ability suppress the raw stdout/stderr dump on error. It was not about changing text color to red. It was about removing the unfiltered noise that made errors hard to find in the terminal. The cargo::error directive was meant to let scripts signal they know the exact reason for the build failure, so that Cargo doesn't need to speculatively print the full output just in case it had clues about the build failure reason.

Printing a message in addition to the raw dump is already handled by cargo::warning. When cargo::error is presented in the same way as cargo::warning, only in red, it's IMHO broken, and failed its primary objective of improving singal to noise ratio by removing the noise.

I suspect the current behavior of exit(0) suppressing the raw dump is just an accidental side effect. anyhow::bail!("build script logged errors") to me doesn't sound like the summary "could not build package due to n previous errors" that rustc prints, but rather a terse phrase trying to say that logging cargo::error without failing the command was an error in itself. But because it uses bail!, it throws away the output that has the stdout/stderr dump.

If the stdout/stderr dump was meant to be consistently printed regardless of the status, then the bail! should instead create an instance of ProcessError from the output, and add it to the error's context.

OTOH for cargo::error to have the intended effect, and not print the raw dump on build failure, the if let Err(error) = output branch should discard the error, since it's an instance of a ProcessError that prints the raw dump. That's the branch that could have used bail!("could not build {pkg_descr} (custom build) due to {n} previous error{s}").

So the logic here looks swapped around by accident.

However, the exit(n) vs panic! distinction is a red herring.

Please do not outright dismiss others like this

I'm sorry, I did not realize it's a strong phrasing. Now I see I didn't quite get your point at first. I was focused on the fact that figuring out the distinction between a panic (that I assumed to be intentional) and exit(1) won't solve the primary problem with the presentation of the exit(1) case.

When a panic is unintentional and caused by a bug in the build script itself, I agree that this could be handled in a different way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-scripts Area: build.rs scripts C-bug Category: bug S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

2 participants