Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust bindings refactor #6309

Open
wants to merge 60 commits into
base: dev
Choose a base branch
from
Open

Rust bindings refactor #6309

wants to merge 60 commits into from

Conversation

emesare
Copy link
Member

@emesare emesare commented Jan 11, 2025

Opening this PR for public discussion. This needs more testing and also lacks a few major changes that make sense to land alongside the already existing set of breaking changes.

The Rust bindings are being heavily refactored for the 4.3 release. This is to address many of the shortcomings and historical issues, namely around maintaining and safety.

Some of the additions of this branch include:

  • Revised project structure
  • Consistent formatting (with rustfmt)
  • Many UB fixes
  • Unit tests
  • More/updated documentation
  • Real rust examples (i.e. cargo run --example)
  • Many of the already opened PR's
  • Additional APIs
  • Updated and idiomatic linking to the core
  • Consolidation of API's (removal of historical API's)
  • Rust main thread handler
  • Type driven API's (ex. RegisterId)
  • Updated CI

emesare and others added 30 commits December 22, 2024 01:21
Moves a bunch of stuff out of src/types.rs that did not belong:
- Confidence
- Variable
- Function specific stuff

- Refactored InstructionInfo, see the msp430 and riscv examples.
- Renamed Function::from_raw to Function::ref_from_raw and fixed places where the ref was incremented twice
- Fixed FunctionRecognizer leaking functions (see above)
- Fixed some apis incorrectly returning Result where Option is clearer
- Started to move destructured types to the From trait for converting to an from ffi types, see Location for an example
- Started to remove bad module level imports (importing std modules like mem all over the place)
- Moved some wrapper handle types to named handle field (this improves readability), see CoreArchitecture for an example
- Removed some unintuitive getters, this is bad practice for Rust code, just access the field directly, see DataVariable for an example
- General code cleanup, purposely did not run rustfmt, that will be a single seperate commit
- Fixed invalid views being able to invoke UB when dealing with databases
- Cleaned up some helper code in dwarf_import
- Fixed inverted is_null checks causing crashes! Oops!
Still a WIP, I think branch info is still invalid, need to figure out the issue there.

- Fixed some invalid Ref lifetimes when constructing indirectly, see Array<DataVariable> for example
- Added some more comments
- Removed some "magic" functions like MLIL Function::ssa_variables

There are still a bunch of invalid lifetimes that aren't crashing us due to the usage of those API's not living long enough. But they ARE an issue.
Trying to comment more TODO's as I go along.

- Renamed llil::Function to llil::LowLevelILFunction for clarity and consistency
- Take base structures by ref in StructureBuilder::set_base_structures to prevent premature drops
- Added more raw to wrapper conversions
- Removed UB prone apis
- Getting rid of more core module references, use std!
- Removed improper Guard usage in array impls for wrapper types with no context
- Untangling the UB of the Label api, still only half done :/
- Misc formatting
- Made Logger ref counted
- Fixed leaking name of logger every time something was logged
- Fixed the last (hopefully) of the unresolved labels
- Simplified some code
componentArray was never freed
When linking you must depend on the -sys crate.

This is because linker arguments (what says where to find binaryninjacore) are NOT transitive. The top level application crate MUST provide it.

For more information see:
- rust-lang/cargo#9554 (comment)
- oxidecomputer/omicron#225
Use cargo to manage the git repo ref instead
This is where all shipped public plugins that are not arch/view/platform/lang will be at from now on

Originally they were in the rust workspace, meaning they all shared a Cargo.lock which is ill-advised.
- More clarification on plugin/executable requirements
- Made examples actually rust examples
- Add Display impl for InstructionTextToken
- Renamed feature "noexports" to "no_exports"
We don't register a compatible log sink so they will just get sent into the void
This is being done in the hopes of curbing the multi-thousand line files that will occur once we flesh out the tests
Still need to add support for running tests in ci
- Architecture id's are now typed accordingly
- Fix some clippy lints
- Make instruction index public in LLIL
- Removed useless helper functions
- LLIL expressions and instruction indexes are now typed accordingly
This should show binaryninjacore-sys alongside binaryninja crate
- Remove lazy_static dependency
- Remove hacky impl Debug for Type and just use the display impl
- Add more debug impls
- Reorder some top level namespace items
- Add first type test
- Added main thread handler api
- Register a headless main thread handler by default in initialization
- Refactor QualifiedName to be properly owned
- Loosened some type constraints on some apis involving QualifiedName
- Fixed some apis that were crashing due to incorrect param types
- Removed extern crate cruft for log crate
- Simplified headless initialization using more wrapper apis
- Fixed segments leaking because of no ref wrapper, see BinaryViewExt::segment_at
- Added rstest to manage headless init in unit tests
- Added some more unit tests
- Refactored demangler api to be more ergonomic
- Fixed minidump plugin not building
- Fixup usage of QualifiedName in plugins
- Make QualifiedName more usable
- Remove under_construction.png from the build.rs it has been removed
- Use InstructionIndex in more places
- Add missing PartialOrd and Ord impls for id types
@emesare
Copy link
Member Author

emesare commented Jan 15, 2025

We need to add a doc file explaining the core linkage more in depth, and how to control the ABI version used independently of the rust API.

I added an example here https://github.com/emesare/simple_rust_binja/blob/master/Cargo.toml but obviously needs to be expanded on and explained.

We should also have a simple "best practices" that is a TLDR, but basically boils down to, please pin your code to a commit or branch, or something other than "dev".

@emesare
Copy link
Member Author

emesare commented Jan 15, 2025

As for can we document what API's are going to be changing, I don't have an exhaustive list, but in the case of a user wanting to know if they should use the rust api for a certain task, over lets say, C++ or python, It would be nice to have.

For an example: User wants to use the flow graph API, however there is not a consumer of that necessarily (other than dwarfdump) so the API might change once there is a consumer.

The easiest way to determine if an API is subject to many changes is to see if there is a consumer. If there is no consumer we could document, either at the module level, or in some documentation that said API would be "unstable". I don't really like doing it the other way around, that is, saying that an API is stable.

@emesare
Copy link
Member Author

emesare commented Jan 15, 2025

The structure grouper is failing for the pdb import plugin, I am not entirely sure what I did to cause that, but the effect in a real binary is palpable.

Test: https://github.com/Vector35/binaryninja-api/actions/runs/12780281192/job/35626277295

The issue in a real binary:
image

Left is before, and right is this branch

@emesare
Copy link
Member Author

emesare commented Jan 15, 2025

We have a real issue with code coverage in the rust api, a large portion of the api is not used at all.

The solution is unit tests and more examples, which is what this PR is working towards. However measuring that is difficult, we might want to look into including some code coverage tool in CI.

@emesare
Copy link
Member Author

emesare commented Jan 15, 2025

What is stopping this PR from being merged?

  1. Some tests are failing for pdb import
  2. Need to run through some binaries again to make sure there are no regressions performance-wise, this was done after some changes to the architecture api but has not been done again

@emesare
Copy link
Member Author

emesare commented Jan 15, 2025

As for can we document what API's are going to be changing, I don't have an exhaustive list, but in the case of a user wanting to know if they should use the rust api for a certain task, over lets say, C++ or python, It would be nice to have.

For an example: User wants to use the flow graph API, however there is not a consumer of that necessarily (other than dwarfdump) so the API might change once there is a consumer.

The easiest way to determine if an API is subject to many changes is to see if there is a consumer. If there is no consumer we could document, either at the module level, or in some documentation that said API would be "unstable". I don't really like doing it the other way around, that is, saying that an API is stable.

For this I think both a little foreword in the README.md explaining what API's are actively being used, and therefor more reliable and better suited for consumers that are not comfortable with API's that weren't tested/or API's with the possibility of breaking changes. Aside from the README.md we might also want to add a warning on modules like MLIL and HLIL that they are subject to breaking changes.

The goal for this is to make it clear for those that do not want the API to change as frequently, if rust fits the bill, or if they should possibly use C++/Python api instead.

@emesare
Copy link
Member Author

emesare commented Jan 15, 2025

Are there any more planned API changes for this PR?

No, I don't have anything else planned, and the changes that are occurring are on functions that previously had no visible consumer.

- Make workflow cloning explicit
- Add workflow tests
- Add missing property string list getting for settings
- Remove IntoActivityName (see #6257)
@emesare
Copy link
Member Author

emesare commented Jan 15, 2025

GitHub has no easy way to set deployment retention, this is insane behavior.

https://github.com/Vector35/binaryninja-api/deployments/testing

Every push to the rust code will trigger this, we might need to setup another workflow to run on some arbitrary schedule to delete old deployments.

This commit is half done
@emesare
Copy link
Member Author

emesare commented Jan 16, 2025

image

Left is this PR, right is current dev. Any match that is traversing the IL fails. (Because the IL traversal is borked)

At some point during the last few days, IL traversal has been borked. Need to fix that obviously before this is merged.

@emesare
Copy link
Member Author

emesare commented Jan 16, 2025

Next priority (due to the above) is to add LLIL unit tests. We need to have the visit_tree functions tested thoroughly to prevent regressions.

Most likely the WARP is not visiting the expressions, so first look at the visit_tree set of functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Rust API Issue needs changes to the Rust API Effort: High Issue should take > 1 month Impact: Medium Issue is impactful with a bad, or no, workaround Type: Enhancement Issue is a small enhancement to existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add binaryninjacore-sys to online docs
3 participants