Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update disableSamples to create samples config resource if it does not exist #4045

Merged
merged 2 commits into from
Jan 15, 2025

Conversation

tsatam
Copy link
Collaborator

@tsatam tsatam commented Jan 13, 2025

Which issue this PR addresses:

Fixes ARO-14389

What this PR does / why we need it:

The current implementation of disableSamples fails and retries if the config resource does not exist. This resource is created by the samples operator at the end of its bootstrapping process. However, in some circumstances, this process can take a long time to finish, and as a result this resource will not exist before our disableSamples implementation gives up and exits.

This PR updates this process to instead, create the config resource if it does not exist, following the recommendations outlined in https://docs.openshift.com/container-platform/4.17/openshift_images/configuring-samples-operator.html#samples-operator-restricted-network-install-with-access.

Further improvement ideas (not implemented in this PR):

  • Should we implement an explicit condition check for ensuring the samples CO is healthy after performing this change, rather than failing implicitly on the "clusterVersionReady" condition (which depends on all COs being healthy)?

Test plan for issue:

  • Unit tests were updated to handle this case
  • [ ] Attempt a cluster install with this fix and a reproducer for the failure mode here (e.g. custom routing that redirects traffic for registry.redhat.io to a location that times out requests) we were not able to create a reproducer.

Is there any documentation that needs to be updated for this PR?

No

How do you know this will function as expected in production?

Above testing/validation

@tsatam tsatam force-pushed the tsatam/ARO-14389-fix-disablesamples branch from 9a3a10c to 23600d9 Compare January 13, 2025 18:07
Copy link

@geowa4 geowa4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

@kimorris27 kimorris27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as long as local testing gives good results.

@hlipsig
Copy link
Contributor

hlipsig commented Jan 13, 2025

LGTM in addition to automated e2e I'd like a couple of by hand tests executed in different regions to ensure we didn't miss anything.

// bootstrap and create the resource
//
// https://docs.openshift.com/container-platform/4.17/openshift_images/configuring-samples-operator.html#samples-operator-restricted-network-install-with-access
if errors.IsNotFound(err) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make sure that if this fails with a conflict (e.g. it gets bootstrapped, maybe?) then we leave it alone? Or retry, maybe by using clienthelper's Ensure or something?

Copy link
Collaborator Author

@tsatam tsatam Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have been retrying on conflicts:

return retry.OnError(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess the whole thing being wrapped in it would do it as well. I'm a derp :)

Copy link
Collaborator

@fahlmant fahlmant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@yjst2012 yjst2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one minor thing, is it worth pulling the disableSamples step a bit forward, eg, before waiting for the workers to be ready?

+			steps.Action(m.disableSamples),
			steps.Condition(m.minimumWorkerNodesReady, 30*time.Minute, true),

@tsatam tsatam merged commit c6c8c5b into master Jan 15, 2025
22 checks passed
@tsatam tsatam deleted the tsatam/ARO-14389-fix-disablesamples branch January 15, 2025 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants