Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change ACE_Barrier running_threads_ to atomic to ensure accurate values across cores/CPUs #2170

Merged
merged 3 commits into from
Jan 12, 2024

Conversation

shuston
Copy link
Contributor

@shuston shuston commented Dec 1, 2023

Customer experiences ACE_Barrier that (intermittently) never counts down to 1 under heavy system load. Theorizing that the ACE_Sub_Barrier running_threads_ member needs to be atomic to properly be viewed/manipulated across cores/CPUs.

@jwillemsen
Copy link
Member

I haven’t look at the existing code to see if there is a race condition possible, but I would like to propose to use a std::atomic here

@shuston
Copy link
Contributor Author

shuston commented Dec 1, 2023

I haven’t look at the existing code to see if there is a race condition possible, but I would like to propose to use a std::atomic here

I like that... I guess I went too "old school" with this change ;-)
I will do that - thanks, Johnny!

@shuston shuston requested a review from jwillemsen December 1, 2023 22:06
@shuston
Copy link
Contributor Author

shuston commented Dec 2, 2023

@jwillemsen all the Windows tasks fail... do you know if that's "normal"?

@jwillemsen
Copy link
Member

I think that is a problem with the vcpkg version used in your branch, I would recommend to merge master into your branch, I think that should resolve that compile issue

@jwillemsen
Copy link
Member

ACE_Barrier does have a guard that is locked always before using running_threads_

@jwillemsen
Copy link
Member

Maybe worth some test extension? Did you hear from your customer whether this change fixes their issue?

@shuston
Copy link
Contributor Author

shuston commented Dec 2, 2023

ACE_Barrier does have a guard that is locked always before using running_threads_

Yes, but we added an ACE_DEBUG inside that lock and 5 threads called wait and the running_threads_ count was 5 in every thread. The problem only occurs under tremendous load. It may take them some time to get the change into QA and run enough load to see if the problem still occurs.

@jwillemsen
Copy link
Member

From a quick code inspection I don't see a real problem, but there is already a test extension in the old bugzilla which the author says doesn't exit, see http://bugzilla.dre.vanderbilt.edu/show_bug.cgi?id=4078, maybe that helps

@shuston shuston merged commit 8c49e7b into master Jan 12, 2024
82 checks passed
@shuston shuston deleted the barrier_concurrency_fail branch January 12, 2024 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants