Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete next cell irrespective of last deletion #901

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

auniyal61
Copy link
Contributor

@auniyal61 auniyal61 commented Nov 27, 2024

Right now when multiple cells gets deleted if any one the cell deletion fails, the control exits with error msg.
This change stores the errors in a list and let next cells deleted.

Closes #OSPRH-10550

Right now when multiple cells gets deleted if any one the cell deletion
fails, the control exits with error msg.
This change stores the errors in a list and let next cells deleted.

Closes #OSPRH-10550
@auniyal61
Copy link
Contributor Author

The jira report says, we need to do this for cellCreation as well, but I think its already taken care of at here https://github.com/openstack-k8s-operators/nova-operator/blob/main/controllers/nova_controller.go#L328-L356

Copy link
Contributor

@mrkisaolamb mrkisaolamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some func tests for cell deletion to assert if deletion is working and returning errors for all cells. Also it will be nice to assert what are conditions of nova instance

Copy link
Contributor

openshift-ci bot commented Dec 17, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: auniyal61
Once this PR has been reviewed and has the lgtm label, please ask for approval from mrkisaolamb. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/55d93ec55a534c64a74ed0e2e7f7761a

✔️ openstack-meta-content-provider SUCCESS in 47m 42s
nova-operator-kuttl RETRY_LIMIT in 17m 30s
nova-operator-tempest-multinode FAILURE in 22m 11s
nova-operator-tempest-multinode-ceph FAILURE in 23m 05s

@auniyal61 auniyal61 force-pushed the cell-deletion branch 4 times, most recently from ea9aefb to fe73ce1 Compare December 19, 2024 09:56
Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deletion logic looks good with a small request about ordering. I have couple of comments on the testing side.

}

if len(deleteErrs) > 0 {
return ctrl.Result{}, fmt.Errorf("errors: %v", deleteErrs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can user errors.Join() to wrap the list of errors into a single error without the string conversion. https://pkg.go.dev/errors#Join

Comment on lines +234 to +229
cell2Account, cell2Secret := mariadb.CreateMariaDBAccountAndSecret(
cell2.MariaDBAccountName, mariadbv1.MariaDBAccountSpec{})
DeferCleanup(k8sClient.Delete, ctx, cell2Account)
DeferCleanup(k8sClient.Delete, ctx, cell2Secret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like an unrelated change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This When block is for 3 cells
The newly added When Block is for 4 cells

with this now, BeforeEach block under Describe has 2 cells - 0 and 1


It("deletes cell3 and verifies error for cell2 because its DB deleted already", func() {

// manually delete DB for cell2, to reproduce the error in cell2 deletion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is one way to trigger an error during cell deletion.

Comment on lines +206 to +207
delete(nova.Spec.CellTemplates, "cell2")
delete(nova.Spec.CellTemplates, "cell3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know that the deletion of cell2 is always before cell3 in novaCellList.Items at L605 in the controller? If cell3 can be before cell2 then the test below does not prove that the error in cell2 was just collected but not stopped the loop there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the way to make it testable is to sort the cells be cell name when iterated in the controller. We do that for cell creation anyhow in #903 to avoid the unstable ordering to cause unstable condition messages.

g.Expect(nova.Status.RegisteredCells).NotTo(HaveKey(cell3.CellCRName.Name))
}, timeout, interval).Should(Succeed())

// NovaCellNotExists(cell3.CellCRName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yepp you can check if k8s returns not found for cell3 to prove that it is deleted. You can also check that the RegisteredCells list does not contain cell3 any more but still contains cell2.

Comment on lines 219 to 225
// cell2 deletion should have failed
Eventually(func(g Gomega) {
mappingJob := th.GetJob(cell2.CellDeleteJobName)
newJobInputHash := GetEnvVarValue(
mappingJob.Spec.Template.Spec.Containers[0].Env, "INPUT_HASH", "")
g.Expect(newJobInputHash).NotTo(BeEmpty())
}, timeout, interval).Should(Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this proves that the cell2 delete failed. It checks for the cell2 delete job to exists
.

@@ -600,18 +600,26 @@ func (r *NovaReconciler) Reconcile(ctx context.Context, req ctrl.Request) (resul
return ctrl.Result{}, err
}

var deleteErrs []error

for _, cr := range novaCellList.Items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would sort this list by cell name for two reasons:

  • Sort cell-names in nova_controller #903 does it for cell creation already to make the error messages stable even if the order of the list of cells in the api is not stable.
  • this makes testing possible as we can rely on the ordering to prove that even if cell2 failed to delete cell3 deletion is tried.

@gibizer
Copy link
Contributor

gibizer commented Dec 20, 2024

Running the test locally shows that cell2 is also successfully deleted so the current test does not recreate the necessary scenario to prove that failure in one cell does not prevent the deletion of the other.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5839fe00e7f1456ba553d6adfcd1e5ee

✔️ openstack-meta-content-provider SUCCESS in 2h 43m 17s
nova-operator-kuttl RETRY_LIMIT in 29m 32s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 06m 12s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 26m 19s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants