Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runs slow. Anyone interested in improving performance? #43

Open
mrolle45 opened this issue Nov 27, 2017 · 2 comments
Open

Runs slow. Anyone interested in improving performance? #43

mrolle45 opened this issue Nov 27, 2017 · 2 comments

Comments

@mrolle45
Copy link

I don't want to take the time right now to submit performance enhancements, but perhaps @moyix or some other person reading this not would like to do the work.
I find that a tremendous amount of time is spent with file reads, string concatenations, and substring operations. There are two ways to speed things up that I have seen, and would be simple to implement:

  1. In StreamFile class, cache the stream pages, so you only have to read them once from the file. Or better, if the platform supports mmap, just mmap the entire PDB file, create a buffer for it, and take a slice of the buffer for a stream page whenever you need it. In the non-mmap case, you could add a method to clear the cache, to be called, for instance, after parsing the entire stream.
  2. In StreamFile._read, see how many pages are spanned by the request. Use the above cache / mmap to get slices of individual pages. Return the slice, or a concatenation of two slices, or use CStringIO to assemble more than two slices. Using _read_pages is inefficient because then you have to take a slice of the result.

I think this would eliminate most of the time spent in parsing a PDB as a whole. You could try profiling pdbparse with a large file, such as ntoskrnl.pdb.

@ZhangShurong
Copy link

@mrolle45 I tried mmap, But it still very slow, Do you have any suggestions?

@moyix
Copy link
Owner

moyix commented Aug 21, 2018

You should try profiling, but my guess is that some of the slowness is due to the use of Construct. One workaround is to only parse the streams you need for a particular task; you can see an example of this here:

pdb = pdbparse.parse(args[0],fast_load=True)
pdb.STREAM_TPI.load()
pdb.STREAM_DBI.load()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants