Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(securitycenter): issues related to flaky tests in SHA & ETD #4898

Conversation

vijaykanthm
Copy link
Contributor

Description

Fixes #4866 #4871 Flakiness in tests due to Concurrent Modification error.

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Checklist

  • I have followed Contributing Guidelines from CONTRIBUTING.MD
  • Tests pass: go test -v ./.. (see Testing)
  • Code formatted: gofmt (see Formatting)
  • Vetting pass: go vet (see Formatting)
  • These samples need a new API enabled in testing projects to pass (let us know which ones)
  • These samples need a new/updated env vars in testing projects set to pass (let us know which ones)
  • This sample adds a new sample directory, and I updated the CODEOWNERS file with the codeowners for this sample
  • This sample adds a new Product API, and I updated the Blunderbuss issue/PR auto-assigner with the codeowners for this sample
  • Please merge this PR for me once it is approved

@vijaykanthm vijaykanthm requested review from a team as code owners December 20, 2024 23:23
@product-auto-label product-auto-label bot added api: securitycenter Issues related to the Security Command Center API. samples Issues that are directly related to samples. labels Dec 20, 2024
Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will post my detailed review in a few minutes. In the meantime, here's a quick summary to help you and other reviewers get up to speed:

This pull request addresses flaky tests in the golang-samples repository, specifically those related to Security Health Analytics (SHA) and Event Threat Detection (ETD). The core issue was a concurrent modification error causing test instability.

Here's a breakdown of the changes:

  • Added retry mechanism: A retryOperation function was implemented to handle transient errors during cleanup of existing custom modules. This improves the robustness of the test suite.
  • Added Mutex: A mutex (mu sync.Mutex) was added to protect shared resources in the test functions, preventing race conditions and concurrent modification errors. This addresses the root cause of the flakiness.
  • Refactored cleanup functions: The cleanupExistingCustomModules and cleanupCustomModule functions were extracted and improved for better readability and maintainability. These functions now efficiently clean up test resources before and after test execution.
  • Renamed test functions: Several test functions were renamed for clarity and consistency (e.g., TestCreateCustomModule to TestCreateSHACustomModule).

The changes are primarily focused on securitycenter/management_api/event_threat_detection/event_threat_detection_custom_module_test.go and securitycenter/management_api/security_health_analytics_custom_module_test.go. The line numbers in the diff clearly show the additions of the mutex, retry logic, and refactoring of the cleanup functions. Specific line changes include the addition of the sync package import (lines 26 in both files), the declaration of the mutex (lines 38 in both files), and the implementation of the retry logic (lines 48-52 in both files).

I'll provide a more detailed review shortly, but this summary should give you a good overview of the changes and their intent.

And now, a little haiku to lighten the mood:

Tests once flaky,
Now run smooth, a mutex's grace,
Code review's delight.

Copy link

Gemini encountered an error creating the review. You can try again by commenting @code-review-assist review.

Copy link

@shayneholmes shayneholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this! This is a speculative PR, since we don't yet know the root cause for the concurrent modification message. We should roll it back if it doesn't fix the underlying issues.

@shayneholmes shayneholmes removed their assignment Jan 4, 2025
@telpirion telpirion self-assigned this Jan 6, 2025
Copy link
Collaborator

@telpirion telpirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use testutil.Retry instead of creating your own retry function.

https://github.com/GoogleCloudPlatform/golang-samples/blob/main/internal/testutil/retry.go

Copy link
Collaborator

@telpirion telpirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a bit more context about what is being attempted with this PR.

parts := strings.Split(trimmedFullName, "/")
if len(parts) > 0 {
return parts[len(parts)-1]
func cleanupExistingCustomModules(orgID string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue!: Remove this. We sometimes have tests running in parallel--this could potentially delete a resource used by another test.

@@ -225,10 +220,13 @@ func TestCreateEtdCustomModule(t *testing.T) {
func TestGetEtdCustomModule(t *testing.T) {
var buf bytes.Buffer

mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why the mutex here? What shared resource is being protected inside the critical section / this function?

@@ -254,10 +252,13 @@ func TestGetEtdCustomModule(t *testing.T) {
func TestUpdateEtdCustomModule(t *testing.T) {
var buf bytes.Buffer

mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous

@@ -281,10 +282,13 @@ func TestUpdateEtdCustomModule(t *testing.T) {
func TestDeleteEtdCustomModule(t *testing.T) {
var buf bytes.Buffer

mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous

@@ -308,10 +312,13 @@ func TestDeleteEtdCustomModule(t *testing.T) {
func TestListEtdCustomModule(t *testing.T) {
var buf bytes.Buffer

mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous

@@ -336,10 +343,13 @@ func TestListEtdCustomModule(t *testing.T) {
func TestListEffectiveEtdCustomModule(t *testing.T) {
var buf bytes.Buffer

mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous.

@@ -364,10 +374,13 @@ func TestListEffectiveEtdCustomModule(t *testing.T) {
func TestGetEffectiveEtdCustomModule(t *testing.T) {
var buf bytes.Buffer

mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous.

@@ -393,10 +406,13 @@ func TestGetEffectiveEtdCustomModule(t *testing.T) {
func TestListDescendantEtdCustomModule(t *testing.T) {
var buf bytes.Buffer

mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous.

@@ -421,10 +437,13 @@ func TestListDescendantEtdCustomModule(t *testing.T) {
func TestValidateEtdCustomModule(t *testing.T) {
var buf bytes.Buffer

mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous.

parts := strings.Split(trimmedFullName, "/")
if len(parts) > 0 {
return parts[len(parts)-1]
func cleanupExistingCustomModules(orgID string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous about deleting all resources in a organization.

@muncus
Copy link
Collaborator

muncus commented Jan 14, 2025

I've looked through the logs for several of these failed builds, and it is only Create and Delete operations that lead to "concurrent modification" errors - To me, this points to the organization object being the contended write, not the module. To tackle that, i'd suggest creating a small number of modules in TestMain(), and reusing those modules throughout the tests. The current tests appear to create at least 15 modules.

Another way to approach this is to use testutils.Retry when creating and deleting modules.

@vijaykanthm
Copy link
Contributor Author

As per the email discussion, closing this PR, will refactor the clean up of resources to avoid the resource contention issue and will raise a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: securitycenter Issues related to the Security Command Center API. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

securitycenter/management_api/event_threat_detection: TestGetEtdCustomModule failed
4 participants