-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search Process is Unoptimized #87
Comments
The way Dxbx scans is much faster than the algorithm linked above, since that still scans over the entire memory range when looking for the presence of one single symbol (which makes scan for all symbols O(n*m) cost). By using a trie data structure Dxbx doesn't need to check all memory addresses per symbol in the database, but instead, starting from each memory address, it walks the trie until it hits a leaf or a dead end (making this cost O(n*log(m)). Since the root of the trie is used frequently, it stays in cache, and since memory is scanned sequentially, those cache line fills are fast too. |
afaik, trie data structure will cause the existing groups to dismantle into their own individual. This will cause harder to track what's not being used and what's currently in our database. Keep in mind, we're dealing with over 400 - 600 symbols per library by the time we're done with the database. If there's a way to keep the existing groups intact, feel free to optimize the existing macro. |
In the respective discussion on Discord, I also recommended using an accelerator structure (although not necessarily multi-level like a tree) to search for all functions at the same time (so the XBE is only scanned as few times as necessary, and it remains in cache).
This is just a description of a trie. Please provide a link to relevant sections when talking about existing code. What are the leafs: Funtions? What are the links: XBE bytes? Which byte-index is looked at to find the next link? What happens on failure: will it go to the next byte, or can it skip bytes like Boyer-Moore search? Also, when / how does it deal with X-Refs? Multiple scans? Different tries to ensure correct prioritization?
There can be a preprocessor which builds an accelerator structure. Larger changes to the database shouldn't be necessary. If the syntax is simplified (as I had recommended in the past, turning it into an DSL) it could simply be pre-processed at compile using a small helper-script, too. The existing database should move into data files anyway. |
To prevent repeating myself, here a link that describes the Dxbx detection code in a bit more detail : http://dxbx-emu.com/2010/11/04/dxbx-symbol-detection/ And for people that want to read a little more on the background story, see this : https://ngemu.com/threads/dxbx-status-api-detection.107584/ And here's the obligatory Wikipedia article link : https://en.wikipedia.org/wiki/Radix_tree And for the lazy people that don't want to do a little searching for this, here a link to part of the Dxbx code that accesses this trie structure : https://github.com/PatrickvL/Dxbx/blob/master/Source/Delphi/src/uStoredTrieTypes.pas And here the code that builds the trie, based on pattern files: https://github.com/PatrickvL/Dxbx/blob/master/Source/Delphi/src/Tools/PatternTrieBuilder/uPatternsToTrie.pas Note, that this is 9 years old code, a lot has been learned since then, and remember that this was written in Delphi which has it's own unique strengths and weaknesses. |
Somehow keyword automatically marked this as closed. Reopened since it is not fully resolve. |
Base on @JayFoxRox's feedback about XbSymbolDatabase. The search process is unoptimized because of not using better algorithms to gain faster scan process.
Currently, the scan process check for OV then move on to next offset, +1. If I recall correctly, it also check the reference address first.
In my case, it is D3D8 library taking way much longer time in
.text
section than others.The text was updated successfully, but these errors were encountered: