Using a parser for implementing `parse_expr`? #145

Krzmbrzl · 2023-09-19T10:40:38Z

Afaict the functionality behind sequant::parse_expr is currently based on sophisticated regular expressions. However, I was wondering whether it might not be more straight forward to instead use an actual parser for this purpose (see also https://stackoverflow.com/q/11905506)?

Such a parser could be simply auto-generated from a grammar file that I believe will be much more readable as a whole than the current regular expressions. Plus, making changes to such a grammar typically tends to be a lot easier than refactoring regular expressions.

I would recommend making use of https://github.com/yhirose/cpp-peglib, which doesn't need a separate generation step for the parser (as in: use an external tool to generate the C++ source files that constitute the parser) but instead compiles the parser in-place (pretty much the same as RegEx work).

Since I have to mess with the parsing logic when implementing #107 anyway, I am actually considering whether it might be less work to simply refactor the parsing into a PEG grammar rather than fully wrap my head around the regular expressions (and the hard-coded match group indices) 🤔

@bimalgaudel it seems to me that you were the one who (originally) wrote the parsing code. Are there specific reasons why you chose to use regular expressions over a parser?

As a nice bonus, using a parser would allow us to get proper error messages that can be displayed to the user if their input is invalid.

The text was updated successfully, but these errors were encountered:

Krzmbrzl · 2023-09-19T10:50:21Z

To get an impression of how the definition could look like, here's a simplified grammar for SeQuant-like expressions that I have quickly thrown together (doesn't quite work yet, but should draw the picture)
Here's how the entire grammar for expression parsing would look like (not thoroughly tested but seems to work - try it yourself here):

# Parser for SeQuant expressions
Sum        <- Product (( '+' / '-' ) Product)*
Product    <- Group ('*'? Group)*
Group      <- '(' Sum ')' / Elementary
Elementary <- Tensor / Variable / Number
Tensor     <- Name IndexSpecs Symmetry?
IndexSpecs <- '^{' IndexList? '}_{' IndexList? '}' / '_{' IndexList? '}^{' IndexList? '}' / '{' IndexList? ';' IndexList? '}'
IndexList  <- Index (',' Index)*
Index      <- Name ( '<' Name (',' Name)* '>' )?
Symmetry   <- ':A' / ':S' / ':N'
Variable   <- Name
Name       <- < [a-zA-Z_] [a-zA-Z_0-9]* >
Number     <- < ('+' / '-')? Real ('/' Real)? >
Real       <- [0-9]+ ('.' [0-9]*)?

%whitespace <- [ \t]*

This would then only have to be amended with the corresponding actions that transform the parsed expression into actual SeQuant objects.

Krzmbrzl · 2023-09-19T13:27:21Z

To be clear here: This is me asking: Would it be okay if I rewrote the parse-logic to use this parser framework instead of the regular expressions?

asadchev · 2023-09-19T13:55:46Z

Gonna chime in. Sequant already depends on Boost, correct? Instead of adding another dep, consider using Boost.Spirit, a mature stable documented embedded parser DSL, header only

…

On Tue, Sep 19, 2023, 10:36 AM Robert Adam ***@***.***> wrote: To be clear here: This is me asking: Would it be okay if I rewrote the parse-logic to use this parser framework instead of the regular expressions? — Reply to this email directly, view it on GitHub <#145 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH62KDE4UNSKXW3NY6C34DX3GNUVANCNFSM6AAAAAA46C4WD4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

bimalgaudel · 2023-09-19T14:37:32Z

Great idea. Sure you can @Krzmbrzl but we want to minimize the external dependencies. I like the idea of Boost.Spirit. I was not aware of the existence of such libraries while implementing the parser. I agree the that the current ~~grammar~~ grammar impl. is unreadable.

asadchev · 2023-09-19T14:56:22Z

if yall do end up using spirit i have bit of code to parse array/tensor expressions already

…

On Tue, Sep 19, 2023, 11:37 AM Bimal Gaudel ***@***.***> wrote: Great idea. Sure you can @Krzmbrzl <https://github.com/Krzmbrzl> but we want to minimize the external dependencies. I like the idea of Boost.Spirit. I was not aware of existince of such libraries while implementing the parser. I agree the that the current grammar is unreadable. — Reply to this email directly, view it on GitHub <#145 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH62KENUUMYA4QOHT7AHWTX3GU3NANCNFSM6AAAAAA46C4WD4> . You are receiving this because you commented.Message ID: ***@***.***>

Krzmbrzl · 2023-09-19T14:57:07Z

Okay sweet - I'll start with Boost.Spirit tomorrow then!
While I think Boost.Spirit makes the grammar a little harder to read, I do understand the desire to minimize dependencies. Plus, it's not looking too bad in Spirit after all :)

Krzmbrzl · 2023-09-19T14:58:46Z

@asadchev if you point me to that code, I'll happily take inspiration from it - otherwise I'll simply start from scratch tomorrow. I have done quite a bit of compiler design and grammar writing in the past, so I expect this to be reasonably straight-forward 👀

asadchev · 2023-09-19T15:37:22Z

https://github.com/asadchev/bluejay/blob/9e57895659b5ba6bfc4d15cbe075b8a9f86a5ef4/src/lib/bluejay/chem/integral/transform/expression.cc#L48C26-L48C26

…

On Tue, Sep 19, 2023, 11:58 AM Robert Adam ***@***.***> wrote: @asadchev <https://github.com/asadchev> if you point me to that code, I'll happily take inspiration from it - otherwise I'll simply start from scratch tomorrow. I have done quite a bit of compiler design and grammar writing in the past, so I expect this to be reasonably straight-forward 👀 — Reply to this email directly, view it on GitHub <#145 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH62KEH5JYWQ55LQBVI323X3GXLDANCNFSM6AAAAAA46C4WD4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

evaleev · 2023-09-19T16:50:57Z

will also chime in ... my understanding is that Boost spirit parser is for developing DSLs embedded in C++ (hence subject to its grammar rules .. @asadchev correct me if I'm wrong). sequant::parse_expr parses "DSL" (latex) embedded into strings. @Krzmbrzl what is your goal? To develop a higher-level DSL embedded in C++ (I'd carefully plan to make sure that the grammar you want can be fit into C++) or a standalone DSL?

asadchev · 2023-09-19T16:59:50Z

Its for embedded parsers, as long as grammar can be written in EBNF form you can write rules. If yall using regex to do that Spirit will be much much shorter and much much more extensible and understandable.

…

On Tue, Sep 19, 2023, 1:51 PM Eduard Valeyev ***@***.***> wrote: will also chime in ... my understanding is that Boost spirit parser is for developing DSLs embedded in C++ (hence subject to its grammar rules .. @asadchev <https://github.com/asadchev> correct me if I'm wrong). sequant::parse_expr parses "DSL" (latex) embedded into strings. @Krzmbrzl <https://github.com/Krzmbrzl> what is your goal? To develop a higher-level DSL embedded in C++ (I'd carefully plan to make sure that the grammar you want can be fit into C++) or a standalone DSL? — Reply to this email directly, view it on GitHub <#145 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH62KCJZ3JKYQYWAHDSBZ3X3HEPZANCNFSM6AAAAAA46C4WD4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Krzmbrzl · 2023-09-19T17:00:04Z

my understanding is that Boost spirit parser is for developing DSLs embedded in C++

No Boost.Spirit is a DSL to write down grammars within C++ syntax restrictions, which are then used to automatically create a parser that parses input based on that grammar.
So the goal is to replace the RegEx magic with a shorter and more concise grammar definition that is then able to do the parsing for us (thanks to the magic of Boost.Spirit)

asadchev · 2023-09-19T17:05:05Z

Robert, are you on Valeev Slack?

…

On Tue, Sep 19, 2023, 2:00 PM Robert Adam ***@***.***> wrote: No Boost.Spirit *is* a DSL to write down grammars within C++ syntax restrictions, which are then used to automatically create a parser that parses input based on that grammar. So the goal is to replace the RegEx magic with a shorter and more concise grammar definition that is then able to do the parsing for us (thanks to the magic of Boost.Spirit) — Reply to this email directly, view it on GitHub <#145 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH62KFC5OQSYYOQWQ2YEEDX3HFR5ANCNFSM6AAAAAA46C4WD4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Krzmbrzl · 2023-09-19T17:28:28Z

@asadchev I am not but if that's where y'all steer the SeQuant development, I'd very much like to join, given that I'm going to start contributing to SeQuant quite a bit (if everything works out as planned)

Krzmbrzl · 2023-09-27T09:56:48Z

Alright - the new parser implementation is ready: #146
Using Boost Spirit instead of a tool that allows specifying the grammar as a regular string makes working with the grammar a lot harder and the readability is also slightly decreased (imo). Nonetheless, I think this is huge improvement over the RegEx-based version.

Krzmbrzl mentioned this issue Sep 22, 2023

Replace expression parser #146

Merged

bimalgaudel closed this as completed in #146 Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using a parser for implementing `parse_expr`? #145

Using a parser for implementing `parse_expr`? #145

Krzmbrzl commented Sep 19, 2023 •

edited

Loading

Krzmbrzl commented Sep 19, 2023 •

edited

Loading

Krzmbrzl commented Sep 19, 2023

asadchev commented Sep 19, 2023 via email

bimalgaudel commented Sep 19, 2023 •

edited

Loading

asadchev commented Sep 19, 2023 via email

Krzmbrzl commented Sep 19, 2023

Krzmbrzl commented Sep 19, 2023

asadchev commented Sep 19, 2023 via email

evaleev commented Sep 19, 2023

asadchev commented Sep 19, 2023 via email

Krzmbrzl commented Sep 19, 2023 •

edited

Loading

asadchev commented Sep 19, 2023 via email

Krzmbrzl commented Sep 19, 2023 •

edited

Loading

Krzmbrzl commented Sep 27, 2023

Using a parser for implementing parse_expr? #145

Using a parser for implementing parse_expr? #145

Comments

Krzmbrzl commented Sep 19, 2023 • edited Loading

Krzmbrzl commented Sep 19, 2023 • edited Loading

Krzmbrzl commented Sep 19, 2023

asadchev commented Sep 19, 2023 via email

bimalgaudel commented Sep 19, 2023 • edited Loading

asadchev commented Sep 19, 2023 via email

Krzmbrzl commented Sep 19, 2023

Krzmbrzl commented Sep 19, 2023

asadchev commented Sep 19, 2023 via email

evaleev commented Sep 19, 2023

asadchev commented Sep 19, 2023 via email

Krzmbrzl commented Sep 19, 2023 • edited Loading

asadchev commented Sep 19, 2023 via email

Krzmbrzl commented Sep 19, 2023 • edited Loading

Krzmbrzl commented Sep 27, 2023

Using a parser for implementing `parse_expr`? #145

Using a parser for implementing `parse_expr`? #145

Krzmbrzl commented Sep 19, 2023 •

edited

Loading

Krzmbrzl commented Sep 19, 2023 •

edited

Loading

bimalgaudel commented Sep 19, 2023 •

edited

Loading

Krzmbrzl commented Sep 19, 2023 •

edited

Loading

Krzmbrzl commented Sep 19, 2023 •

edited

Loading