-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Approach to improve C++ performance #22
Comments
Mingyu, I definitely agree that the parser can be made faster. I have characterized a handful of issues that would help with speed.
The cost of sending large syntax trees over MathLink is significant. I have been investigating using the new Compiler to create kernel Exprs directly completely remove the cost of sending trees over MathLink.
The parser creates intermediate Node objects that must be released when they are finished being used. This is also a significant cost and I would like to look at some memory arena solution to free up entire trees at once instead of releasing all Nodes individually.
All of the logic for converting Concrete Syntax into Abstract Syntax is in top-level WL code. This is... not as fast as compiled C++, to say the least. I would like to move this logic into C++, but this would be a lot of work.
Currently, the parser is a recursive descent parser that uses the stack (you can make it crash with stack overflow if you try right now). It would be much better to use a table-base LL(1)-ish approach. I'm not interested in transitioning to a bottom-up LR parser. The Pratt parser approach that is currently being used has worked well so far. WL actually has a relatively simple grammar that fits well with a "indexed by token" approach. I always say that it is harder to tokenize WL than to actually parse WL. I'm sure there are other, smaller issues to fix up, but those are the big issues. |
I have recently made good progress on this issue.
|
Hi Brenton, I've been doing some very simple runs to get a few results by recording the
CodeParser`Library`$ConcreteParseTime
.As shown below, the x-axis is the LoC of the input string, the y-axis shows the time (seconds) it takes to
Tokenize
andCodeConcreteParse
the string. By estimation, a file of ~1000 LoC might take ~0.14 s to tokenize and ~0.16 s to parse (if I subtract the time to tokenize during code parsing. Of course, there will be overhead on the WL side, but the slope shows that the size of the file have larger impact to the parsing process.Given the grammar complexity, I think it challenging to push the parser even faster every time and you've done a great job. But I still have faith in the cpp impl that it can be more efficient and that's why I only tested the two library functions. From the history of the commits, I see you have some better way to profile it, so I wonder if you'd like to share some results about the bottleneck you've found in the cpp side regarding the latest master revision? And any plan that they can be improved in the long run, so that by any chance I can make some effort to it as well? Thanks so much.
The text was updated successfully, but these errors were encountered: