-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the steps to reboot the computes after update. #2587
base: main
Are you sure you want to change the base?
Conversation
Thanks for the PR! ❤️ |
09bc6f0
to
0641dd5
Compare
Current tested with ping test running in the background and found not loss of connectivity. |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4f12a60d8dc64dad92d4f7b5f5bec990 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 40m 14s |
0641dd5
to
a22e794
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/bf06493eaeea430c898bf25520dfdd04 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 32s |
a22e794
to
4a5a0f2
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/caf50b2e762a4aaeb3326c9399dffd15 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 21m 02s |
c056d3c
to
85e4367
Compare
PATH: "{{ cifmw_path | default(ansible_env.PATH) }}" | ||
ansible.builtin.command: >- | ||
{{ cifmw_update_oc_cmd_prefix }} | ||
get openstackdataplanedeployment {{ cifmw_reboot_dep_name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get openstackdataplanedeployment {{ cifmw_reboot_dep_name }} | |
wait openstackdataplanedeployment {{ cifmw_reboot_dep_name }} | |
--for=condition=ready | |
--timeout={{ cifmw_update_timeout_reboot }}m |
With oc wait ansible log is more readable, as there no retires logged to output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Done.
edpm_reboot_strategy: force | ||
ansibleLimit: {{ cifmw_update_hypervisor_short_name }} | ||
|
||
- name: Apply the OpenStackDataPlaneDeployment CR to trigger a reboot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: Apply the OpenStackDataPlaneDeployment CR to trigger a reboot | |
- name: Create the OpenStackDataPlaneDeployment CR to trigger a reboot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some minor suggestions.
- name: Define command for OpenStack client interactions | ||
ansible.builtin.set_fact: | ||
cifmw_update_openstack_cmd: >- | ||
oc rsh -n {{ cifmw_update_namespace }} openstackclient openstack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a variable, not a default in the vars/main.yaml file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not, sure about that one. I've moved that definition to default/main.yaml
as it's used a lot of time in the reboot sequence. Let me know if that's what you had in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine :)
- name: Create the OpenStackDataPlaneDeployment CR used for reboot | ||
ansible.builtin.copy: | ||
dest: "{{ cifmw_update_artifacts_basedir }}/{{ cifmw_reboot_dep_name }}.yaml" | ||
content: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building a yaml (or json) with plain string manipulation is asking for trouble. Why not using to_nice yaml like this?
vars:
_content:
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
name: "{{ cifmw_reboot_dep_name }}"
namespace: "{{ cifmw_update_namespace }}"
spec:
nodeSets: "{{ cifmw_update_node_sets.stdout | split('\n') }}"
servicesOverride:
- reboot-os
ansibleExtraVars:
edpm_reboot_strategy: force
ansibleLimit: {{ cifmw_update_hypervisor_short_name }}
ansible.builtin.copy:
dest: "{{ cifmw_update_artifacts_basedir }}/{{ cifmw_reboot_dep_name }}.yaml"
content: "{{ _content | to_nice_yaml }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oki, this one is nice as well. Processing was a little more involved, but the version I've got should be working.
af21247
to
b22e26f
Compare
New changes are detected. LGTM label has been removed. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pablintino The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b30a7822761f41f3890d3b541e5e5bd6 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 42m 25s |
5b95216
to
0217325
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/87194aa88e3d4b5e90652416f40b57cb ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 32s |
0217325
to
95a8fee
Compare
95a8fee
to
442ab42
Compare
442ab42
to
39a32d4
Compare
39a32d4
to
ef7112a
Compare
This sequence implements reboot of the compute nodes after the update. By default it's not run and `cifmw_update_reboot_test` must be set to true to activate it. We have one instance created. If the hypervisor being rebooted has the instance that instance will be live-migrated to another hypervisor before the reboot and migrated back to that original hypervisor after the reboot. Some basic sanity checks are performed after the reboot and before the migration back to ensure that the necessary services are up and running. During the reboot we start two scripts. One monitors and log the reboot of the hypervisors. The other log where the instance is currently running. The log files can be found in `~/ci-framework-data/tests/update/` in `monitor_servers.log` and `monitor_vm_placement.log` respectively. A note about node evacuation. We are still using node evaction from the nova cli. This command has not been ported to the openstack cli. There's a discussion about it [on launchpad](https://bugs.launchpad.net/python-openstackclient/+bug/2055552). Also, we do the evacuation only if there are more than one hypervisor available. When only one compute is available we stop and and after reboot, we just restart the instance. The official documentation mention only the live-migration path, but as we also use the live-migration in the test sequence that part is covered. We still expect customer to use the nova cli as it's way more user friendly and is still currently working. Closes: https://issues.redhat.com/browse/OSPRH-8937
ef7112a
to
ff86b85
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0030f67185a246beaa6641ac614816e3 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 43m 19s |
This sequence implements reboot of the compute nodes after the
update. By default it's not run and
cifmw_update_reboot_test
must beset to true to activate it.
We have one instance created. If the hypervisor being rebooted has
the instance that instance will be live-migrated to another hypervisor
before the reboot and migrated back to that original hypervisor after
the reboot.
Some basic sanity checks are performed after the reboot and before the
migration back to ensure that the necessary services are up and
running.
During the reboot we start two scripts. One monitors and log the
reboot of the hypervisors. The other log where the instance is
currently running. The log files can be found in
~/ci-framework-data/tests/update/
inmonitor_servers.log
andmonitor_vm_placement.log
respectively.A note about node evacuation. We are still using node evaction from
the nova cli. This command has not been ported to the openstack
cli. There's a discussion about it on launchpad.
Also, we do the evacuation only if there are more than one hypervisor
available. When only one compute is available we stop and and after
reboot, we just restart the instance.
The official documentation mention only the live-migration path, but
as we also use the live-migration in the test sequence that part is
covered. We still expect customer to use the nova cli as it's way
more user friendly and is still currently working.
Closes: https://issues.redhat.com/browse/OSPRH-8937