-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update disableSamples to create samples config resource if it does not exist #4045
Conversation
9a3a10c
to
23600d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as long as local testing gives good results.
LGTM in addition to automated e2e I'd like a couple of by hand tests executed in different regions to ensure we didn't miss anything. |
// bootstrap and create the resource | ||
// | ||
// https://docs.openshift.com/container-platform/4.17/openshift_images/configuring-samples-operator.html#samples-operator-restricted-network-install-with-access | ||
if errors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make sure that if this fails with a conflict (e.g. it gets bootstrapped, maybe?) then we leave it alone? Or retry, maybe by using clienthelper's Ensure or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have been retrying on conflicts:
Line 24 in 23600d9
return retry.OnError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I guess the whole thing being wrapped in it would do it as well. I'm a derp :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one minor thing, is it worth pulling the disableSamples step a bit forward, eg, before waiting for the workers to be ready?
+ steps.Action(m.disableSamples),
steps.Condition(m.minimumWorkerNodesReady, 30*time.Minute, true),
Which issue this PR addresses:
Fixes ARO-14389
What this PR does / why we need it:
The current implementation of disableSamples fails and retries if the config resource does not exist. This resource is created by the samples operator at the end of its bootstrapping process. However, in some circumstances, this process can take a long time to finish, and as a result this resource will not exist before our disableSamples implementation gives up and exits.
This PR updates this process to instead, create the config resource if it does not exist, following the recommendations outlined in https://docs.openshift.com/container-platform/4.17/openshift_images/configuring-samples-operator.html#samples-operator-restricted-network-install-with-access.
Further improvement ideas (not implemented in this PR):
Test plan for issue:
[ ] Attempt a cluster install with this fix and a reproducer for the failure mode here (e.g. custom routing that redirects traffic for registry.redhat.io to a location that times out requests)we were not able to create a reproducer.Is there any documentation that needs to be updated for this PR?
No
How do you know this will function as expected in production?
Above testing/validation