You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't want to take the time right now to submit performance enhancements, but perhaps @moyix or some other person reading this not would like to do the work.
I find that a tremendous amount of time is spent with file reads, string concatenations, and substring operations. There are two ways to speed things up that I have seen, and would be simple to implement:
In StreamFile class, cache the stream pages, so you only have to read them once from the file. Or better, if the platform supports mmap, just mmap the entire PDB file, create a buffer for it, and take a slice of the buffer for a stream page whenever you need it. In the non-mmap case, you could add a method to clear the cache, to be called, for instance, after parsing the entire stream.
In StreamFile._read, see how many pages are spanned by the request. Use the above cache / mmap to get slices of individual pages. Return the slice, or a concatenation of two slices, or use CStringIO to assemble more than two slices. Using _read_pages is inefficient because then you have to take a slice of the result.
I think this would eliminate most of the time spent in parsing a PDB as a whole. You could try profiling pdbparse with a large file, such as ntoskrnl.pdb.
The text was updated successfully, but these errors were encountered:
You should try profiling, but my guess is that some of the slowness is due to the use of Construct. One workaround is to only parse the streams you need for a particular task; you can see an example of this here:
I don't want to take the time right now to submit performance enhancements, but perhaps @moyix or some other person reading this not would like to do the work.
I find that a tremendous amount of time is spent with file reads, string concatenations, and substring operations. There are two ways to speed things up that I have seen, and would be simple to implement:
StreamFile
class, cache the stream pages, so you only have to read them once from the file. Or better, if the platform supportsmmap
, just mmap the entire PDB file, create a buffer for it, and take a slice of the buffer for a stream page whenever you need it. In the non-mmap case, you could add a method to clear the cache, to be called, for instance, after parsing the entire stream.StreamFile._read
, see how many pages are spanned by the request. Use the above cache / mmap to get slices of individual pages. Return the slice, or a concatenation of two slices, or use CStringIO to assemble more than two slices. Using_read_pages
is inefficient because then you have to take a slice of the result.I think this would eliminate most of the time spent in parsing a PDB as a whole. You could try profiling pdbparse with a large file, such as ntoskrnl.pdb.
The text was updated successfully, but these errors were encountered: