Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in ACE_Future double checked locking pattern code #2163

Closed
nfrmtkr opened this issue Nov 16, 2023 · 6 comments · Fixed by #2179
Closed

Race condition in ACE_Future double checked locking pattern code #2163

nfrmtkr opened this issue Nov 16, 2023 · 6 comments · Fixed by #2179
Labels

Comments

@nfrmtkr
Copy link
Contributor

nfrmtkr commented Nov 16, 2023

Version

ACE 7.0.2

Host machine and operating system

Build machine: CentOS Linux release 7.9.2009

Target machine and operating system (if different from host)

Runtime: Redhat 8

Compiler name and version (including patch level)

gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)

The $ACE_ROOT/ace/config.h file

#include "ace/config-linux.h"
#define ACE_HAS_IPV6 1
#ifndef ACE_LACKS_STROPTS_H
#define ACE_LACKS_STROPTS_H 1
#endif
#ifndef ACE_LACKS_STRRECVFD
#define ACE_LACKS_STRRECVFD 1
#endif
#undef ACE_HAS_STRBUF_T
#undef ACE_HAS_STREAMS

The $ACE_ROOT/include/makeinclude/platform_macros.GNU file

not linked. The contents is:

INSTALL_PREFIX=/home/xgbuild/xgen-external/main/install/centos_7/ace-7.0.2
optimize=1
debug=1
ssl=1
buildbits=64
include $(ACE_ROOT)/include/makeinclude/platform_linux.GNU

Contents of $ACE_ROOT/bin/MakeProjectCreator/config/default.features

File doesn't exist

AREA/CLASS/EXAMPLE AFFECTED:

ACE compiles and links.
Runtime issue in customer environment.
We can't reproduce the issue localy.
Affected class ACE_Future

The problem effects:

ACE execution

Synopsis

We use ACE_Future for handing over an int in a multi threader producer/consumer secenario.
In one of our custumer environments we see from time to time corrupted (out of scope) values in the result of a ACE_Future::get().

Description

We've debugegd this by forcing a coredump immediatly after we checked for invalid result value of the ACE_Future::get(). The int value was -2013249520, but the future_rep_.value_ value was 0 as expected.

We asume that in this special case the double checked locking pattern used in the set functions lacks the atomicity rule in line

ACE_NEW_RETURN (this->value_, T (r), -1);

The stack overflow article seems to match this use case: https://stackoverflow.com/questions/50641740/double-check-locking-issues-c

Should ACE use an std::atomic for the template value?
Is it a compiler issue, should a compiler switch be added to the make files?

The double checked locking is used at other places as well and may need to be taken into account as well.

Repeat by

Not available. A trivial ACE_Future set/get could show it, but it hardly depends on the runtime.

Sample fix/ workaround

Not yet available, see ideas mentioned in "Description"

@nfrmtkr
Copy link
Contributor Author

nfrmtkr commented Dec 22, 2023

We've fixed the race condition with following patch:
0001-Fixed-possible-data-race-in-ACE_Future-set-get.patch

The following code we used for testing:
acefuturetester.tar.gz

@jwillemsen
Copy link
Member

Please open a pull request with your patch add your test under ACE_wrappers/tests so that it is compiled/tested as part of our test suite

@jwillemsen
Copy link
Member

Maybe determine the max threads based on the number of cpu cores on the system and run it for some minutes, a run of 4 hours is to long for our test suite. Maybe split the PR in two, one with just the test extension, second with the fix

@nfrmtkr
Copy link
Contributor Author

nfrmtkr commented Jan 9, 2024

The test Future_Test.cpp will test this from functional point of view.
The issue can't be easily reproduced. It requires fast machines and it depends on the CPU and compiler being used.

@jwillemsen
Copy link
Member

I think it would be helpful to have the additional test be added to future_test but just do some interations and make it possible to control the amount of iterations on the commandline. What kind of CPU/compiler are required, some background information on why it happens only with some specific cpu/compiler would be helpful, it is very likely that we have more constructs like this

@jwillemsen
Copy link
Member

Looks similar to #2170

jwillemsen added a commit to nfrmtkr/ACE_TAO that referenced this issue Jan 14, 2024
jwillemsen added a commit that referenced this issue Jan 15, 2024
…ure-double-checked-locking-pattern-code

Fixed a race condition caused by the double checked locking pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants