Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BNBeginUndoActions can cause a deadlock #6289

Open
WeiN76LQh opened this issue Dec 30, 2024 · 2 comments
Open

BNBeginUndoActions can cause a deadlock #6289

WeiN76LQh opened this issue Dec 30, 2024 · 2 comments
Labels
Component: DSC Issue needs changes to the DyldSharedCacheView File Format: SharedCache Issue with the dyld_shared_cache plugin Impact: Critical Issue blocks CRITICAL functionality State: Awaiting Triage Issue is waiting for more in-depth triage from a developer Type: Crash Issue is a crash or deadlock

Comments

@WeiN76LQh
Copy link

Version and Platform (required):

  • Binary Ninja Version: 4.3.6617-dev (28632797)
  • OS: macOS
  • OS Version: 15.1.1
  • CPU Architecture: M1

Bug Description:
BNBeginUndoActions can cause a deadlock by what appears to be, from the stack trace below, due to waiting on a condition variable:

__psynch_cvwait (@__psynch_cvwait:5)
_pthread_cond_wait (@_pthread_cond_wait:304)
std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) (@std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&):10)
___lldb_unnamed_symbol22733 (@___lldb_unnamed_symbol22733:19)
___lldb_unnamed_symbol22736 (@___lldb_unnamed_symbol22736:102)
___lldb_unnamed_symbol14121 (@___lldb_unnamed_symbol14121:37)
BNBeginUndoActions (@BNBeginUndoActions:11)
BinaryNinja::FileMetadata::BeginUndoActions(bool)
BinaryNinja::BinaryView::BeginUndoActions(bool)
SharedCacheCore::SharedCache::FindSymbolAtAddrAndApplyToAddr(unsigned long long, unsigned long long, bool)
::BNDSCFindSymbolAtAddressAndApplyToAddress(BNSharedCache *, uint64_t, uint64_t, bool)
SharedCacheAPI::SharedCache::FindSymbolAtAddrAndApplyToAddr(unsigned long long, unsigned long long, bool) const
SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0::operator()() const
decltype(std::declval<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0&>()()) std::__1::__invoke[abi:ne180100]<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0&>(SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0&)
void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne180100]<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0&>(SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0&)
std::__1::__function::__alloc_func<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0, std::__1::allocator<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0>, void ()>::operator()[abi:ne180100]()
std::__1::__function::__func<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0, std::__1::allocator<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0>, void ()>::operator()()
std::__1::__function::__value_func<void ()>::operator()[abi:ne180100]() const
std::__1::function<void ()>::operator()() const
WorkerActionCallback(void*)

There are other threads stuck in BNForgetUndoActions:

__psynch_cvwait (@__psynch_cvwait:5)
_pthread_cond_wait (@_pthread_cond_wait:304)
std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) (@std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&):10)
___lldb_unnamed_symbol22733 (@___lldb_unnamed_symbol22733:19)
___lldb_unnamed_symbol22736 (@___lldb_unnamed_symbol22736:102)
___lldb_unnamed_symbol14125 (@___lldb_unnamed_symbol14125:73)
BNForgetUndoActions (@BNForgetUndoActions:38)
BinaryNinja::FileMetadata::ForgetUndoActions(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)
BinaryNinja::BinaryView::ForgetUndoActions(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)
SharedCacheCore::SharedCache::FindSymbolAtAddrAndApplyToAddr(unsigned long long, unsigned long long, bool)
::BNDSCFindSymbolAtAddressAndApplyToAddress(BNSharedCache *, uint64_t, uint64_t, bool)
SharedCacheAPI::SharedCache::FindSymbolAtAddrAndApplyToAddr(unsigned long long, unsigned long long, bool) const
SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0::operator()() const
decltype(std::declval<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0&>()()) std::__1::__invoke[abi:ne180100]<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0&>(SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0&)
void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne180100]<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0&>(SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0&)
std::__1::__function::__alloc_func<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0, std::__1::allocator<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0>, void ()>::operator()[abi:ne180100]()
std::__1::__function::__func<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0, std::__1::allocator<SharedCacheWorkflow::ProcessOffImageCall(BinaryNinja::Ref<BinaryNinja::AnalysisContext>, BinaryNinja::Ref<BinaryNinja::Function>, BinaryNinja::Ref<BinaryNinja::MediumLevelILFunction>, BinaryNinja::MediumLevelILInstruction, unsigned long, bool)::$_0>, void ()>::operator()()
std::__1::__function::__value_func<void ()>::operator()[abi:ne180100]() const
std::__1::function<void ()>::operator()() const
WorkerActionCallback(void*)

Another thread (with the name "Worker T module:core.module.loadDebugInfo module:core.module...") is stuck waiting here:

__psynch_cvwait (@__psynch_cvwait:5)
_pthread_cond_wait (@_pthread_cond_wait:304)
std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) (@std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&):10)
___lldb_unnamed_symbol22733 (@___lldb_unnamed_symbol22733:19)
___lldb_unnamed_symbol22736 (@___lldb_unnamed_symbol22736:102)
___lldb_unnamed_symbol14143 (@___lldb_unnamed_symbol14143:26)
BNGetFileViewOfType (@BNGetFileViewOfType:39)
___lldb_unnamed_symbol913 (@___lldb_unnamed_symbol913:19)
___lldb_unnamed_symbol920 (@___lldb_unnamed_symbol920:11)
___lldb_unnamed_symbol924 (@___lldb_unnamed_symbol924:13)
___lldb_unnamed_symbol420 (@___lldb_unnamed_symbol420:10)
___lldb_unnamed_symbol723 (@___lldb_unnamed_symbol723:10)
___lldb_unnamed_symbol13411 (@___lldb_unnamed_symbol13411:29)
___lldb_unnamed_symbol13401 (@___lldb_unnamed_symbol13401:34)
___lldb_unnamed_symbol7047 (@___lldb_unnamed_symbol7047:27)
___lldb_unnamed_symbol7014 (@___lldb_unnamed_symbol7014:81)
___lldb_unnamed_symbol7628 (@___lldb_unnamed_symbol7628:15)
___lldb_unnamed_symbol6639 (@___lldb_unnamed_symbol6639:20)
___lldb_unnamed_symbol30229 (@___lldb_unnamed_symbol30229:723)
___lldb_unnamed_symbol30252 (@___lldb_unnamed_symbol30252:148)
___lldb_unnamed_symbol30251 (@___lldb_unnamed_symbol30251:10)
___lldb_unnamed_symbol30472 (@___lldb_unnamed_symbol30472:18)
___lldb_unnamed_symbol27455 (@___lldb_unnamed_symbol27455:19)
___lldb_unnamed_symbol29856 (@___lldb_unnamed_symbol29856:734)
___lldb_unnamed_symbol29851 (@___lldb_unnamed_symbol29851:6)
_pthread_start (@_pthread_start:37)

2 threads called "Thread (pooled)" are stuck here:

__psynch_cvwait (@__psynch_cvwait:5)
_pthread_cond_wait (@_pthread_cond_wait:304)
std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) (@std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&):10)
___lldb_unnamed_symbol22733 (@___lldb_unnamed_symbol22733:19)
___lldb_unnamed_symbol22736 (@___lldb_unnamed_symbol22736:102)
BNExecuteOnMainThreadAndWait (@BNExecuteOnMainThreadAndWait:21)
BinaryNinja::ExecuteOnMainThreadAndWait(std::__1::function<void ()> const&)
BackgroundThread::runThread()
BackgroundThread::start(QVariant)::'lambda'()::operator()() const
decltype(std::declval<BackgroundThread::start(QVariant)::'lambda'()&>()()) std::__1::__invoke[abi:ne180100]<BackgroundThread::start(QVariant)::'lambda'()&>(BackgroundThread::start(QVariant)::'lambda'()&)
std::__1::invoke_result<BackgroundThread::start(QVariant)::'lambda'()&>::type std::__1::invoke[abi:ne180100]<BackgroundThread::start(QVariant)::'lambda'()&>(BackgroundThread::start(QVariant)::'lambda'()&)
QtConcurrent::StoredFunctionCall<BackgroundThread::start(QVariant)::'lambda'()>::runFunctor()::'lambda'(BackgroundThread::start(QVariant)::'lambda'())::operator()(BackgroundThread::start(QVariant)::'lambda'()) const
decltype(std::declval<QtConcurrent::StoredFunctionCall<BackgroundThread::start(QVariant)::'lambda'()>::runFunctor()::'lambda'(BackgroundThread::start(QVariant)::'lambda'()) const&>()(std::declval<BackgroundThread::start(QVariant)::'lambda'()>())) std::__1::__invoke[abi:ne180100]<QtConcurrent::StoredFunctionCall<BackgroundThread::start(QVariant)::'lambda'()>::runFunctor()::'lambda'(BackgroundThread::start(QVariant)::'lambda'()) const&, BackgroundThread::start(QVariant)::'lambda'()>(QtConcurrent::StoredFunctionCall<BackgroundThread::start(QVariant)::'lambda'()>::runFunctor()::'lambda'(BackgroundThread::start(QVariant)::'lambda'()) const&, BackgroundThread::start(QVariant)::'lambda'()&&)
decltype(auto) std::__1::__apply_tuple_impl[abi:ne180100]<QtConcurrent::StoredFunctionCall<BackgroundThread::start(QVariant)::'lambda'()>::runFunctor()::'lambda'(BackgroundThread::start(QVariant)::'lambda'()) const&, std::__1::tuple<BackgroundThread::start(QVariant)::'lambda'()>, 0ul>(QtConcurrent::StoredFunctionCall<BackgroundThread::start(QVariant)::'lambda'()>::runFunctor()::'lambda'(BackgroundThread::start(QVariant)::'lambda'()) const&, std::__1::tuple<BackgroundThread::start(QVariant)::'lambda'()>&&, std::__1::__tuple_indices<0ul>)
decltype(auto) std::__1::apply[abi:ne180100]<QtConcurrent::StoredFunctionCall<BackgroundThread::start(QVariant)::'lambda'()>::runFunctor()::'lambda'(BackgroundThread::start(QVariant)::'lambda'()) const&, std::__1::tuple<BackgroundThread::start(QVariant)::'lambda'()>>(QtConcurrent::StoredFunctionCall<BackgroundThread::start(QVariant)::'lambda'()>::runFunctor()::'lambda'(BackgroundThread::start(QVariant)::'lambda'()) const&, std::__1::tuple<BackgroundThread::start(QVariant)::'lambda'()>&&)
QtConcurrent::StoredFunctionCall<BackgroundThread::start(QVariant)::'lambda'()>::runFunctor()
QtConcurrent::RunFunctionTaskBase<void>::run()
QThreadPoolThread::run()
QThreadPrivate::start(void*)
_pthread_start

The deadlocking is permanent, Binary Ninja becomes not responding and its CPU usage drops to basically 0.

There is a chance this is user error but the deadlocking is occurring inside Binary Ninja core so I'm not sure how I could be the cause of it.

Steps To Reproduce:
I'm not sure how to reproduce with a vanilla copy of the specified version of Binary Ninja. I'm using a customised version of the DSC plugin that does allow more parallelization so it could be user error, but ultimately the hanging is due to something going on in Binary Ninja core.

The way I cause this deadlock is by the following steps:

  1. Open a copy of DYLD Shared Cache
  2. Whilst analysis is ongoing for that instance of DYLD Shared Cache, click File -> New Window.
  3. Open a different copy of DYLD Shared Cache.
  4. Observe deadlock.

The important parts here seem to be that analysis is going on for both at the same time and that both files that are opened are for DYLD Shared Cache.

I know thats not particularly helpful, I'm hoping from the stack traces it might be clear where multiple threads are either contending for the same lock or where a condition variable is not being notified.

Expected Behavior:
To be honest I don't really know. BeginUndoActions is a bit of an odd one because my understanding is it doesn't really work in multi-threaded cases due to the way undo actions have been architected. Obviously it shouldn't deadlock but does the use of BeginUndoActions in a multi-threaded situation make any sense? Given Binary Ninja is designed to be multi-threaded as much as possible does BeginUndoActions make sense in most situations because it won't necessarily batch the expected undo actions? The DSC plugin uses it in a number of places to either forget or batch undo actions, but given its multi-threaded nature would this even work as expected?

@WeiN76LQh
Copy link
Author

Removing calls to BeginUndoActions and ForgetUndoActions/CommitUndoActions in the DSC seems to fix the problem.

This issue is a continuation of the undo action API being problematic for me; its not very performant, causes crashes in headless and there's also this deadlock issue.

@WeiN76LQh
Copy link
Author

Potentially related to #6097

WeiN76LQh pushed a commit to WeiN76LQh/binaryninja-api that referenced this issue Jan 2, 2025
This is a massive and messy commit that started out as trying to add bulk loading of images (the ability to load them in parallel via a single API call) but then required a number of changes that unearthed some fundamental bugs with the current design of `SharedCache`. Untangle it all to create nice clean commits is more work than its worth.

The main bug fix is the fact that `State()` was not actually thread safe because `m_state` which it was returning a copy of could be modified concurrently during the copying. Now a lock is held when accessing `m_state` via functions like `State()` and a `shared_ptr` is returned. This is a thread safe operation and the returned `shared_ptr` guarantees the `State` struct remains allocated whilst that `shared_ptr` exists. This has also required changes to some of the code to create local variables to ensure things remain allocated after they are read from `State()`.

The other big change is introducing an API call to load a number of images concurrently. This is done using worker threads and required some changes to `SharedCache::LoadImageWithInstallName` and `SharedCache::LoadSectionAtAddress` so that they could work in parallel. Previously they just took locks for extended periods, this is now reduced as much as possible. Other work had to be done to make parts of the loading process thread safe.

Something else this commit adds is the commenting out of calls to the undo actions API; `BeginUndoActions`, `ForgetUndoActions` and `CommitUndoActions`. Tbh I don't know if this actually works at all when being used in parallel as, according to Vector35, due to the design of the undo actions system, it can't distinguish undo actions across threads. So when you've got loads of threads making calls to this API and then calling `ForgetUndoActions` or `CommitUndoActions`, is what you expect actually getting forgotten or committed? The main reason for commenting these out though is issues like Vector35#6289.

There might be some other minor changes in this commit I've forgotten about but I think this is the majority of it.
@emesare emesare added Type: Crash Issue is a crash or deadlock State: Awaiting Triage Issue is waiting for more in-depth triage from a developer Impact: Critical Issue blocks CRITICAL functionality File Format: SharedCache Issue with the dyld_shared_cache plugin Component: DSC Issue needs changes to the DyldSharedCacheView labels Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: DSC Issue needs changes to the DyldSharedCacheView File Format: SharedCache Issue with the dyld_shared_cache plugin Impact: Critical Issue blocks CRITICAL functionality State: Awaiting Triage Issue is waiting for more in-depth triage from a developer Type: Crash Issue is a crash or deadlock
Projects
None yet
Development

No branches or pull requests

2 participants