Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tolerations to Build and BuildRun objects #1711

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dorzel
Copy link
Contributor

@dorzel dorzel commented Oct 30, 2024

Changes

Fixes #1636

Submitter Checklist

  • Includes tests if functionality changed/was added
  • Includes docs if changes are user-facing
  • Set a kind label on this PR
  • Release notes block has been filled in, or marked NONE

See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.

Release Notes

Add Tolerations to Build and BuildRun objects

@openshift-ci openshift-ci bot added the release-note Label for when a PR has specified a release note label Oct 30, 2024
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 30, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2024
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 8 times, most recently from 872db31 to 462d9bb Compare November 6, 2024 20:22
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 3 times, most recently from 4ecfc21 to dfe25d5 Compare November 13, 2024 18:59
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 3 times, most recently from e449fbd to 03b3b21 Compare November 19, 2024 18:13
@pull-request-size pull-request-size bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 19, 2024
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 8 times, most recently from 3e66b55 to 43382a6 Compare November 20, 2024 22:06
@dorzel dorzel marked this pull request as ready for review November 21, 2024 17:28
pkg/reconciler/buildrun/resources/taskrun.go Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 7 times, most recently from 5dbdbba to 30677bc Compare December 6, 2024 17:19
@dorzel
Copy link
Contributor Author

dorzel commented Dec 9, 2024

Bit stumped on the failing integration tests - after several revisions I can't find why the taint effect isn't getting set. It also looks like the e2e tests are running out of disk space.

pkg/validate/tolerations.go Outdated Show resolved Hide resolved
test/e2e/v1beta1/e2e_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

openshift-ci bot commented Dec 16, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign heavywombat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines 754 to 758
validateBuildRunToFail(testBuild, buildRun)
buildRun, err = testBuild.LookupBuildRun(types.NamespacedName{Name: buildRun.Name, Namespace: testBuild.Namespace})

Expect(buildRun.Status.FailureDetails.Message).To(Equal(shpgit.AuthPrompted.ToMessage()))
Expect(buildRun.Status.FailureDetails.Reason).To(Equal(shpgit.AuthPrompted.String()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this. There is a BuildRun which cannot start because the tolerations do not match any node. Is not the result that the Pod is stuck in Pending, eventually the TaskRun and BuildRun time out. Why would there be a failure coming from the Git source step ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was an oversight on my part. I would expect the Pending/timeout you mentioned instead. I'll change this.

Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, you may want to create the buildrun with a shorter timeout. I guess it would otherwise wait the default ten minutes.

@dorzel
Copy link
Contributor Author

dorzel commented Jan 8, 2025

An update - still looking into why TaintEffect isn't getting set in the integration tests. Integration tests are at least all failing in the same way.
e2e tests are all over the place, with unrelated tests failing and 2/4 of them nearly passing. I'm wondering if some of these failures are due to the new three-node configuration and disk space issues.

@dorzel dorzel force-pushed the MULTIARCH-5036 branch 5 times, most recently from f17465b to 5181b3f Compare January 20, 2025 18:52
@dorzel
Copy link
Contributor Author

dorzel commented Jan 20, 2025

@SaschaSchwarze0 @adambkaplan

Would I be able to get another look at this? I'm having an issue with the validation logic for the taint effect:
https://github.com/shipwright-io/build/pull/1711/files#diff-991ca77e26cb9e18ce83c2dc437ab9dd148e6c9263b933267733b1b3c0b45c7dR49-R58

In the case where the effect isn't specified in the yaml, as in this test:
https://github.com/shipwright-io/build/pull/1711/files#diff-1cc6608c14aa73c7ad391f5f3d05b858cef1e22b084e778ce651e5b5e663b46aR594

It should get explicitly set on the TaskRun object via the validation logic, but that doesn't happen. Confused, as everything else for the validation seems to work.
Wondering if I'm missing something obvious here or if I need to revert to setting it in taskrun.go

@SaschaSchwarze0
Copy link
Member

Would I be able to get another look at this? I'm having an issue with the validation logic for the taint effect:
https://github.com/shipwright-io/build/pull/1711/files#diff-991ca77e26cb9e18ce83c2dc437ab9dd148e6c9263b933267733b1b3c0b45c7dR49-R58

Sry for the late reply @dorzel.

The validation logic should imo NOT mutate anything on the object. It should only validate. If empty is a good value for the effect, then that's all the validation should do: not complain anything.

The logic that creates the toleration on the TaskRun should then again be aware that empty is a valid value for the effect and translate that into TaintEffectNoSchedule.

@dorzel
Copy link
Contributor Author

dorzel commented Jan 27, 2025

Ok, reverted to setting the TaintEffect outside of validation and that has fixed the integration tests (one unrelated flaky test). Looks like we're just waiting on #1786 to get an actual signal from e2e tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. release-note Label for when a PR has specified a release note size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

SHIP-0039: Allow tolerations on Build and BuildRun to be set
2 participants