Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a parser for implementing parse_expr? #145

Closed
Krzmbrzl opened this issue Sep 19, 2023 · 14 comments · Fixed by #146
Closed

Using a parser for implementing parse_expr? #145

Krzmbrzl opened this issue Sep 19, 2023 · 14 comments · Fixed by #146

Comments

@Krzmbrzl
Copy link
Collaborator

Krzmbrzl commented Sep 19, 2023

Afaict the functionality behind sequant::parse_expr is currently based on sophisticated regular expressions. However, I was wondering whether it might not be more straight forward to instead use an actual parser for this purpose (see also https://stackoverflow.com/q/11905506)?

Such a parser could be simply auto-generated from a grammar file that I believe will be much more readable as a whole than the current regular expressions. Plus, making changes to such a grammar typically tends to be a lot easier than refactoring regular expressions.

I would recommend making use of https://github.com/yhirose/cpp-peglib, which doesn't need a separate generation step for the parser (as in: use an external tool to generate the C++ source files that constitute the parser) but instead compiles the parser in-place (pretty much the same as RegEx work).

Since I have to mess with the parsing logic when implementing #107 anyway, I am actually considering whether it might be less work to simply refactor the parsing into a PEG grammar rather than fully wrap my head around the regular expressions (and the hard-coded match group indices) 🤔

@bimalgaudel it seems to me that you were the one who (originally) wrote the parsing code. Are there specific reasons why you chose to use regular expressions over a parser?

As a nice bonus, using a parser would allow us to get proper error messages that can be displayed to the user if their input is invalid.

@Krzmbrzl
Copy link
Collaborator Author

Krzmbrzl commented Sep 19, 2023

To get an impression of how the definition could look like, here's a simplified grammar for SeQuant-like expressions that I have quickly thrown together (doesn't quite work yet, but should draw the picture)
Here's how the entire grammar for expression parsing would look like (not thoroughly tested but seems to work - try it yourself here):

# Parser for SeQuant expressions
Sum        <- Product (( '+' / '-' ) Product)*
Product    <- Group ('*'? Group)*
Group      <- '(' Sum ')' / Elementary
Elementary <- Tensor / Variable / Number
Tensor     <- Name IndexSpecs Symmetry?
IndexSpecs <- '^{' IndexList? '}_{' IndexList? '}' / '_{' IndexList? '}^{' IndexList? '}' / '{' IndexList? ';' IndexList? '}'
IndexList  <- Index (',' Index)*
Index      <- Name ( '<' Name (',' Name)* '>' )?
Symmetry   <- ':A' / ':S' / ':N'
Variable   <- Name
Name       <- < [a-zA-Z_] [a-zA-Z_0-9]* >
Number     <- < ('+' / '-')? Real ('/' Real)? >
Real       <- [0-9]+ ('.' [0-9]*)?

%whitespace <- [ \t]*

This would then only have to be amended with the corresponding actions that transform the parsed expression into actual SeQuant objects.

@Krzmbrzl
Copy link
Collaborator Author

To be clear here: This is me asking: Would it be okay if I rewrote the parse-logic to use this parser framework instead of the regular expressions?

@asadchev
Copy link
Contributor

asadchev commented Sep 19, 2023 via email

@bimalgaudel
Copy link
Member

bimalgaudel commented Sep 19, 2023

Great idea. Sure you can @Krzmbrzl but we want to minimize the external dependencies. I like the idea of Boost.Spirit. I was not aware of the existence of such libraries while implementing the parser. I agree the that the current grammar grammar impl. is unreadable.

@asadchev
Copy link
Contributor

asadchev commented Sep 19, 2023 via email

@Krzmbrzl
Copy link
Collaborator Author

Okay sweet - I'll start with Boost.Spirit tomorrow then!
While I think Boost.Spirit makes the grammar a little harder to read, I do understand the desire to minimize dependencies. Plus, it's not looking too bad in Spirit after all :)

@Krzmbrzl
Copy link
Collaborator Author

@asadchev if you point me to that code, I'll happily take inspiration from it - otherwise I'll simply start from scratch tomorrow. I have done quite a bit of compiler design and grammar writing in the past, so I expect this to be reasonably straight-forward 👀

@asadchev
Copy link
Contributor

asadchev commented Sep 19, 2023 via email

@evaleev
Copy link
Member

evaleev commented Sep 19, 2023

will also chime in ... my understanding is that Boost spirit parser is for developing DSLs embedded in C++ (hence subject to its grammar rules .. @asadchev correct me if I'm wrong). sequant::parse_expr parses "DSL" (latex) embedded into strings. @Krzmbrzl what is your goal? To develop a higher-level DSL embedded in C++ (I'd carefully plan to make sure that the grammar you want can be fit into C++) or a standalone DSL?

@asadchev
Copy link
Contributor

asadchev commented Sep 19, 2023 via email

@Krzmbrzl
Copy link
Collaborator Author

Krzmbrzl commented Sep 19, 2023

my understanding is that Boost spirit parser is for developing DSLs embedded in C++

No Boost.Spirit is a DSL to write down grammars within C++ syntax restrictions, which are then used to automatically create a parser that parses input based on that grammar.
So the goal is to replace the RegEx magic with a shorter and more concise grammar definition that is then able to do the parsing for us (thanks to the magic of Boost.Spirit)

@asadchev
Copy link
Contributor

asadchev commented Sep 19, 2023 via email

@Krzmbrzl
Copy link
Collaborator Author

Krzmbrzl commented Sep 19, 2023

@asadchev I am not but if that's where y'all steer the SeQuant development, I'd very much like to join, given that I'm going to start contributing to SeQuant quite a bit (if everything works out as planned)

@Krzmbrzl
Copy link
Collaborator Author

Alright - the new parser implementation is ready: #146
Using Boost Spirit instead of a tool that allows specifying the grammar as a regular string makes working with the grammar a lot harder and the readability is also slightly decreased (imo). Nonetheless, I think this is huge improvement over the RegEx-based version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants