-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(securitycenter): issues related to flaky tests in SHA & ETD #4898
fix(securitycenter): issues related to flaky tests in SHA & ETD #4898
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi team,
I'm currently reviewing this pull request and will post my detailed review in a few minutes. In the meantime, here's a quick summary to help you and other reviewers get up to speed:
This pull request addresses flaky tests in the golang-samples
repository, specifically those related to Security Health Analytics (SHA) and Event Threat Detection (ETD). The core issue was a concurrent modification error causing test instability.
Here's a breakdown of the changes:
- Added retry mechanism: A
retryOperation
function was implemented to handle transient errors during cleanup of existing custom modules. This improves the robustness of the test suite. - Added Mutex: A mutex (
mu sync.Mutex
) was added to protect shared resources in the test functions, preventing race conditions and concurrent modification errors. This addresses the root cause of the flakiness. - Refactored cleanup functions: The
cleanupExistingCustomModules
andcleanupCustomModule
functions were extracted and improved for better readability and maintainability. These functions now efficiently clean up test resources before and after test execution. - Renamed test functions: Several test functions were renamed for clarity and consistency (e.g.,
TestCreateCustomModule
toTestCreateSHACustomModule
).
The changes are primarily focused on securitycenter/management_api/event_threat_detection/event_threat_detection_custom_module_test.go
and securitycenter/management_api/security_health_analytics_custom_module_test.go
. The line numbers in the diff clearly show the additions of the mutex, retry logic, and refactoring of the cleanup functions. Specific line changes include the addition of the sync
package import (lines 26 in both files), the declaration of the mutex (lines 38 in both files), and the implementation of the retry logic (lines 48-52 in both files).
I'll provide a more detailed review shortly, but this summary should give you a good overview of the changes and their intent.
And now, a little haiku to lighten the mood:
Tests once flaky,
Now run smooth, a mutex's grace,
Code review's delight.
Gemini encountered an error creating the review. You can try again by commenting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing this! This is a speculative PR, since we don't yet know the root cause for the concurrent modification message. We should roll it back if it doesn't fix the underlying issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use testutil.Retry instead of creating your own retry function.
https://github.com/GoogleCloudPlatform/golang-samples/blob/main/internal/testutil/retry.go
… github.com:vijaykanthm/golang-samples into fix-scc-mgt-api-sha-etd-custom-module-test-failures Merge remote into local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a bit more context about what is being attempted with this PR.
parts := strings.Split(trimmedFullName, "/") | ||
if len(parts) > 0 { | ||
return parts[len(parts)-1] | ||
func cleanupExistingCustomModules(orgID string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue!: Remove this. We sometimes have tests running in parallel--this could potentially delete a resource used by another test.
@@ -225,10 +220,13 @@ func TestCreateEtdCustomModule(t *testing.T) { | |||
func TestGetEtdCustomModule(t *testing.T) { | |||
var buf bytes.Buffer | |||
|
|||
mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why the mutex here? What shared resource is being protected inside the critical section / this function?
@@ -254,10 +252,13 @@ func TestGetEtdCustomModule(t *testing.T) { | |||
func TestUpdateEtdCustomModule(t *testing.T) { | |||
var buf bytes.Buffer | |||
|
|||
mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous
@@ -281,10 +282,13 @@ func TestUpdateEtdCustomModule(t *testing.T) { | |||
func TestDeleteEtdCustomModule(t *testing.T) { | |||
var buf bytes.Buffer | |||
|
|||
mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous
@@ -308,10 +312,13 @@ func TestDeleteEtdCustomModule(t *testing.T) { | |||
func TestListEtdCustomModule(t *testing.T) { | |||
var buf bytes.Buffer | |||
|
|||
mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous
@@ -336,10 +343,13 @@ func TestListEtdCustomModule(t *testing.T) { | |||
func TestListEffectiveEtdCustomModule(t *testing.T) { | |||
var buf bytes.Buffer | |||
|
|||
mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous.
@@ -364,10 +374,13 @@ func TestListEffectiveEtdCustomModule(t *testing.T) { | |||
func TestGetEffectiveEtdCustomModule(t *testing.T) { | |||
var buf bytes.Buffer | |||
|
|||
mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous.
@@ -393,10 +406,13 @@ func TestGetEffectiveEtdCustomModule(t *testing.T) { | |||
func TestListDescendantEtdCustomModule(t *testing.T) { | |||
var buf bytes.Buffer | |||
|
|||
mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous.
@@ -421,10 +437,13 @@ func TestListDescendantEtdCustomModule(t *testing.T) { | |||
func TestValidateEtdCustomModule(t *testing.T) { | |||
var buf bytes.Buffer | |||
|
|||
mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous.
parts := strings.Split(trimmedFullName, "/") | ||
if len(parts) > 0 { | ||
return parts[len(parts)-1] | ||
func cleanupExistingCustomModules(orgID string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous about deleting all resources in a organization.
I've looked through the logs for several of these failed builds, and it is only Create and Delete operations that lead to "concurrent modification" errors - To me, this points to the organization object being the contended write, not the module. To tackle that, i'd suggest creating a small number of modules in TestMain(), and reusing those modules throughout the tests. The current tests appear to create at least 15 modules. Another way to approach this is to use |
As per the email discussion, closing this PR, will refactor the clean up of resources to avoid the resource contention issue and will raise a new one. |
Description
Fixes #4866 #4871 Flakiness in tests due to Concurrent Modification error.
Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.
Checklist
go test -v ./..
(see Testing)gofmt
(see Formatting)go vet
(see Formatting)