Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128657: fix _hashopenssl ref/data race #128886

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

tom-pytel
Copy link
Contributor

@tom-pytel tom-pytel commented Jan 15, 2025

Fix a possible data and PY_EVP_MD refcount race in _hashopenssl.c in py_digest_by_name() under free-threading.

@bedevere-app
Copy link

bedevere-app bot commented Jan 15, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@@ -0,0 +1 @@
Fix a possible data and PY_EVP_MD refcount race in _hashopenssl.c py_digest_by_name() under free-threading.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we try to avoid technical details in the changelog. How about something like this?

Suggested change
Fix a possible data and PY_EVP_MD refcount race in _hashopenssl.c py_digest_by_name() under free-threading.
Fix a crash when using objects returned by :func:`hashlib.sha256` under :term:`free threading`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (though not sure how to state this less technically), but just to be clear there is no crash. All this patch does is prevent a possible extra erroneous refcount on PY_EVP_MD under free threading (and shuts up TSAN).

Modules/_hashopenssl.c Show resolved Hide resolved
@@ -26,6 +26,7 @@
#include "Python.h"
#include "pycore_hashtable.h"
#include "pycore_strhex.h" // _Py_strhex()
Copy link
Member

@ZeroIntensity ZeroIntensity Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nitpick: let's align this comment with the new one below it.

@ZeroIntensity
Copy link
Member

The implementation LGTM. You can reuse the reproducer in the issue as a test. See the devguide for writing tests.

@tom-pytel
Copy link
Contributor Author

The implementation LGTM. You can reuse the reproducer in the issue as a test. See the devguide for writing tests.

Are TSAN tests possible? Because that is the only way the issue shows up, other than as an extra refcount at a specific byte offset of the opaque evp_md_st sturct IF two threads do this simultaneously (very non-deterministic).

@ZeroIntensity
Copy link
Member

Yeah, tests are run with TSan as part of CI. For example, on this PR: https://github.com/python/cpython/actions/runs/12809851053/job/35715616858?pr=128886

@tom-pytel
Copy link
Contributor Author

Yeah, tests are run with TSan as part of CI. For example, on this PR: https://github.com/python/cpython/actions/runs/12809851053/job/35715616858?pr=128886

Still not sure how you want me to implement this test. python -m test --tsan appears to exist with the intent to test the thread sanitizer itself and I can't find any actual test that interacts with tsan to get positive or negative results? Can you point me to one, because I have a test written but no way to know if it succeeds or fails according to tsan.

@ZeroIntensity
Copy link
Member

You don't need to do anything special--just write the test. Python will be built with TSan integration, and races on that test will show up in CI.

@tom-pytel
Copy link
Contributor Author

tom-pytel commented Jan 16, 2025

You don't need to do anything special--just write the test. Python will be built with TSan integration, and races on that test will show up in CI.

Done. Had to add test_hashlib to list of tsan tests if you want it to be executed in that context. Keep in mind is not 100% deterministic to find problem if fix is not present, would have to be even slower for that.

@picnixz
Copy link
Member

picnixz commented Jan 17, 2025

I don't have much time today and tomorrow for in-depth rewview and there is some other stuff to do before reviewing this one, but I'll do it on Sunday or Monday. If I forget, just ping me back (or you can DM me on Discord @ZeroIntensity)

@@ -1196,6 +1196,29 @@ def test_file_digest(self):
with open(os_helper.TESTFN, "wb") as f:
hashlib.file_digest(f, "sha256")

@unittest.skipUnless(support.check_sanitizer(thread=True), "only meaningful on free-threading")
def test_gh_128657(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test that doesn't assert anything always looks suspicious. I think you're hoping for this to trigger TSAN issues and thus this isn't really a pass/fail on its own. Just summarize that kind of thing in a comment within the test for later people who look at this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure the test should be there. If the criteria is that any fixed tsan issue should get a test then pretty soon the tsan section will be full of tests that don't really do anything except run for longer and longer times?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to (technically) assert something--the thread shouldn't raise exceptions, and that's what the test is for if TSan is disabled.

@@ -1196,6 +1196,29 @@ def test_file_digest(self):
with open(os_helper.TESTFN, "wb") as f:
hashlib.file_digest(f, "sha256")

@unittest.skipUnless(support.check_sanitizer(thread=True), "only meaningful on free-threading")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • IIRC, TSan is run for the default build as well, because things like subinterpreters can have thread safety issues.
  • I think it's worth always running this for FT, because we'll need to assert that the thread doesn't raise exceptions (more on that in another comment).
  • Should only run if threading actually works.
Suggested change
@unittest.skipUnless(support.check_sanitizer(thread=True), "only meaningful on free-threading")
@unittest.skipUnless(support.Py_GIL_DISABLED, "only meaningful on free-threading")
@threading_helper.requires_working_threading()

Comment on lines 1213 to 1220
thrds = []

for i in range(num_workers):
thrds.append(thrd := threading.Thread(target=closure, args=(barrier,)))
thrd.start()

for thrd in thrds:
thrd.join()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a utility for starting threads that will simplify this, and we also need to make sure that the thread doesn't raise an exception (threads that fail will not propagate an error to the main thread).

Suggested change
thrds = []
for i in range(num_workers):
thrds.append(thrd := threading.Thread(target=closure, args=(barrier,)))
thrd.start()
for thrd in thrds:
thrd.join()
with threading_helper.catch_threading_exception() as cm:
thrds = [threading.Thread(target=closure, args=(barrier,))]
with threading_helper.start_threads(thrds):
pass
if cm.exc_value:
raise cm.exc_value

@@ -1196,6 +1196,29 @@ def test_file_digest(self):
with open(os_helper.TESTFN, "wb") as f:
hashlib.file_digest(f, "sha256")

@unittest.skipUnless(support.check_sanitizer(thread=True), "only meaningful on free-threading")
def test_gh_128657(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to (technically) assert something--the thread shouldn't raise exceptions, and that's what the test is for if TSan is disabled.

test_hashlib_sha256()

num_workers = 40
num_runs = 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need num_runs. I haven't used or seen it in any other free-threading PRs.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review I forgot to send. I'll have more time starting on Monday to do the in-depth review.

@@ -1196,6 +1196,29 @@ def test_file_digest(self):
with open(os_helper.TESTFN, "wb") as f:
hashlib.file_digest(f, "sha256")

@unittest.skipUnless(support.check_sanitizer(thread=True), "only meaningful on free-threading")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@unittest.skipUnless(support.check_sanitizer(thread=True), "only meaningful on free-threading")
@unittest.skipUnless(support.check_sanitizer(thread=True),
"only meaningful on free-threading")

@@ -1196,6 +1196,29 @@ def test_file_digest(self):
with open(os_helper.TESTFN, "wb") as f:
hashlib.file_digest(f, "sha256")

@unittest.skipUnless(support.check_sanitizer(thread=True), "only meaningful on free-threading")
def test_gh_128657(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a better test case name please as well as a small comment (not docstring) to explain what this test is meant to assert (and then the reference to the GH issue)? this does not help to know why this test is here unless we open GH.

thrds = []

for i in range(num_workers):
thrds.append(thrd := threading.Thread(target=closure, args=(barrier,)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
thrds.append(thrd := threading.Thread(target=closure, args=(barrier,)))
thread = threading.Thread(target=closure, args=(barrier,))
threads.append(thread)


for i in range(num_runs):
barrier = threading.Barrier(num_workers)
thrds = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
thrds = []
threads = []

Comment on lines 1217 to 1220
thrd.start()

for thrd in thrds:
thrd.join()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
thrd.start()
for thrd in thrds:
thrd.join()
thread.start()
for thread in threads:
thread.join()

I would prefer full names. Or if you want short names, you can use t for the thread variable (but I would prefer to keep threads).

@tom-pytel
Copy link
Contributor Author

Sent up requested changes, but I want to be clear, with or without the tsan fix, this test doesn't actually do anything useful (I have no tsan output or test failure when fix is removed).

$ ./python -m test --tsan test_hashlib -v
== CPython 3.14.0a4+ experimental free-threading build (heads/array-10-g40a4d88a14-dirty:40a4d88a14, Jan 16 2025, 11:58:58) [GCC 11.4.0]
== Linux-6.8.0-51-generic-x86_64-with-glibc2.35 little-endian
== Python build: free_threading debug TSAN
...
test_py_digest_by_name_data_race (test.test_hashlib.KDFTests.test_py_digest_by_name_data_race) ... ok
...
Result: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants