Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Approach to improve C++ performance #22

Open
kenkangxgwe opened this issue Jul 28, 2020 · 2 comments
Open

[Discussion] Approach to improve C++ performance #22

kenkangxgwe opened this issue Jul 28, 2020 · 2 comments

Comments

@kenkangxgwe
Copy link
Contributor

Hi Brenton, I've been doing some very simple runs to get a few results by recording the CodeParser`Library`$ConcreteParseTime.

As shown below, the x-axis is the LoC of the input string, the y-axis shows the time (seconds) it takes to Tokenize and CodeConcreteParse the string. By estimation, a file of ~1000 LoC might take ~0.14 s to tokenize and ~0.16 s to parse (if I subtract the time to tokenize during code parsing. Of course, there will be overhead on the WL side, but the slope shows that the size of the file have larger impact to the parsing process.

  • Mathematica Version: 11.3
  • Paclet version: 1.0
  • Operating System: Windows 10

image

Given the grammar complexity, I think it challenging to push the parser even faster every time and you've done a great job. But I still have faith in the cpp impl that it can be more efficient and that's why I only tested the two library functions. From the history of the commits, I see you have some better way to profile it, so I wonder if you'd like to share some results about the bottleneck you've found in the cpp side regarding the latest master revision? And any plan that they can be improved in the long run, so that by any chance I can make some effort to it as well? Thanks so much.

@bostick
Copy link
Contributor

bostick commented Aug 3, 2020

Mingyu,

I definitely agree that the parser can be made faster.

I have characterized a handful of issues that would help with speed.

  1. MathLink

The cost of sending large syntax trees over MathLink is significant. I have been investigating using the new Compiler to create kernel Exprs directly completely remove the cost of sending trees over MathLink.

  1. releasing memory

The parser creates intermediate Node objects that must be released when they are finished being used. This is also a significant cost and I would like to look at some memory arena solution to free up entire trees at once instead of releasing all Nodes individually.

  1. Using top-level

All of the logic for converting Concrete Syntax into Abstract Syntax is in top-level WL code. This is... not as fast as compiled C++, to say the least.

I would like to move this logic into C++, but this would be a lot of work.

  1. Recursive descent

Currently, the parser is a recursive descent parser that uses the stack (you can make it crash with stack overflow if you try right now). It would be much better to use a table-base LL(1)-ish approach. I'm not interested in transitioning to a bottom-up LR parser. The Pratt parser approach that is currently being used has worked well so far. WL actually has a relatively simple grammar that fits well with a "indexed by token" approach. I always say that it is harder to tokenize WL than to actually parse WL.

I'm sure there are other, smaller issues to fix up, but those are the big issues.

@bostick
Copy link
Contributor

bostick commented Jun 23, 2022

I have recently made good progress on this issue.

  1. Starting with 13.1 (soon to be released), there is an option to create kernel exprs directly instead of using MathLink. This can be turned on in CMake with -DTRANSPORT=ExprLib

  2. I have put in a lot of work to minimize the time needed for freeing memory, though there is still room for improvement.

  3. Abstracting logic is still in top-level WL code, so no progress here yet.

  4. I have put a lot of work into allowing clang's [[clang::musttail]] annotation to be used to guarantee that tail calls are optimized. This can be specified in CMake with -DUSE_MUSTTAIL=ON. However, this is still experimental because there are mysterious crashes when building with Release and I am tracking this down and reporting to get fixed in Clang / LLVM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants