[BUG]: Data race when using static variables with free-threaded Python #5489

rostan-t · 2025-01-07T14:38:55Z

Required prerequisites

Make sure you've read the documentation. Your issue may be addressed there.
Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
Consider asking first in the Gitter chat room or in a Discussion.

What version (or hash if on master) of pybind11 are you using?

v2.13.6

Problem description

With free-threaded Python, I get data races when using static class variables defined like this:

#include <pybind11/pybind11.h>

namespace py = pybind11;

class Foo {
};

Foo Bar;

PYBIND11_MODULE(foo, m, py::mod_gil_not_used()) {
    py::class_<Foo>(m, "Foo")
        .def_readonly_static("Bar", &Bar);
}

I get multiple kinds of errors when accessing Foo.Bar from multiple threads, from segmentation faults to "pybind11_object_dealloc(): Tried to deallocate unregistered instance!" exceptions.

Further investigation suggested that it might come from Foo.Bar being freed too many times. I'm unsure if it's a pybind11 or CPython issue so I'm posting it here.

Reproducible example code

import threading

from foo import Foo


def create_foo():
    for _ in range(1000):
        Foo.Bar


# keeping a reference to Foo.Bar prevents the issue from happening
# bar = Foo.Bar

nb_threads = 10
threads = [threading.Thread(target=create_foo) for _ in range(nb_threads)]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

Is this a regression? Put the last known working version here if it is.

Not a regression

The text was updated successfully, but these errors were encountered:

khotalota · 2025-01-09T23:20:49Z

This appears to stem from unsafe memory management and reference counting.
When multiple thread access Foo.Bar, pybind creats temproray python wrappers objects and if reference counts are decremented concurrently or improperly then it lead to Premature deallocation, Segmentation fault, expections that you have

So, storing a persistence reference to Foo.bar that you did prevents this reference count from dropping to zero.
However this is just a workaround

To make it thread safe try

std::shared_ptr : it can manage the lifecycle of the static variable safely. Something like below, if this works 🥲

#include <pybind11/pybind11.h>
#include <memory>
#include <mutex>

namespace py = pybind11;

class Foo {};

// Use a shared pointer for Bar
std::shared_ptr<Foo> Bar = std::make_shared<Foo>();
std::mutex bar_mutex;

PYBIND11_MODULE(foo, m) {
  py::class_<Foo>(m, "Foo")
      .def_property_static(
          "Bar",
          [](py::object) {
              std::lock_guard<std::mutex> lock(bar_mutex); // Thread-safe access
              return Bar;
          },
          [](py::object, std::shared_ptr<Foo> new_bar) {
              std::lock_guard<std::mutex> lock(bar_mutex); // Thread-safe modification
              Bar = new_bar;
          },
          py::return_value_policy::reference
      );
}

And also add Mutex for synchronization

This approach should eliminate the race conditions, segmentation faults, and exceptions while ensuring compatibility with free-threaded Python.

rwgk · 2025-01-13T16:21:28Z

Did you already try the reproducer with the latest state of the master branch? In particular, #5419 was merged after the 2.13.6 release (I'm totally not sure if that's related to your issue, but it's a quick experiment).

@colesbury for awareness, JIC this rings any bells.

rostan-t · 2025-01-13T16:36:26Z

Thanks for the reply! I tried again on master and it doesn't solve the issue.

colesbury · 2025-01-13T17:04:12Z

I'm able to reproduce the failure. I'll look into it.

colesbury · 2025-01-13T18:11:04Z

Ok, I think I understand the issue. There's a race between object deallocation (pybind11_object_dealloc) and retrieving the instance from the instance map (find_registered_python_instance).

There's a brief period of time when the instance's reference count is zero, but it's not yet removed from the instance map. During this time, another thread may retrieve the instance from the map and increment its reference count) Deallocation still proceeds as normal, but now there's an instance that thinks it's registered but is not in the instance map. This crashes on subsequent deallocation calls.

In CPython, we use _Py_TryIncref for similar cases. That's not a public API, but we can work on making it available (probably as an "unstable" API). Basically, find_registered_python_instance should only return an instance if that instance's reference count is not zero (but do so in a thread safe way).

szalpal · 2025-01-13T18:20:03Z

Thank you @colesbury for looking into this. Would this request be a similar thing we are trying to achieve? python/cpython#113920

colesbury · 2025-01-13T18:23:25Z

Yes

In the free threading build, there's a race between wrapper re-use and wrapper deallocation. This can happen with a static variable accessed by multiple threads. Fixing this requires using some private CPython APIs: _Py_TryIncref and _PyObject_SetMaybeWeakref. The implementations of these functions are included until they're made available as public ("unstable") APIs. Fixes pybind#5489

rostan-t added the triage New bug, unverified label Jan 7, 2025

colesbury mentioned this issue Jan 14, 2025

Make _Py_TryIncref public as an unstable API as PyUnstable_TryIncref() python/cpython#128844

Open

colesbury linked a pull request Jan 15, 2025 that will close this issue

Fix data race when using shared variables (free threading) #5494

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Data race when using static variables with free-threaded Python #5489

[BUG]: Data race when using static variables with free-threaded Python #5489

rostan-t commented Jan 7, 2025 •

edited

Loading

khotalota commented Jan 9, 2025

rwgk commented Jan 13, 2025

rostan-t commented Jan 13, 2025

colesbury commented Jan 13, 2025

colesbury commented Jan 13, 2025

szalpal commented Jan 13, 2025

colesbury commented Jan 13, 2025

[BUG]: Data race when using static variables with free-threaded Python #5489

[BUG]: Data race when using static variables with free-threaded Python #5489

Comments

rostan-t commented Jan 7, 2025 • edited Loading

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

Problem description

Reproducible example code

Is this a regression? Put the last known working version here if it is.

khotalota commented Jan 9, 2025

rwgk commented Jan 13, 2025

rostan-t commented Jan 13, 2025

colesbury commented Jan 13, 2025

colesbury commented Jan 13, 2025

szalpal commented Jan 13, 2025

colesbury commented Jan 13, 2025

rostan-t commented Jan 7, 2025 •

edited

Loading