Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add infix operator precedence rules #5273

Merged
merged 19 commits into from
Aug 24, 2024
Merged

Add infix operator precedence rules #5273

merged 19 commits into from
Aug 24, 2024

Conversation

runarorama
Copy link
Contributor

@runarorama runarorama commented Aug 11, 2024

Overview

This adds precedence rules for infix operators, so e.g. 7 == 1 + 2 * 3 will be parsed as 7 == (1 + (2 * 3)) instead of ((7 == 1) + 2) * 3. The term printer will also print these back without parentheses.

Previously infix operators would always parse left-associated with no precedence.

This change adds hard-coded rules for infix operators based on their unqualified names. The precedence levels are as follows (higher precedence binds tighter):

  1. || and |
  2. && and &
  3. ==, !==, !=, and ===
  4. <, >, >=, and <=
  5. + and -
  6. *, /, and %
  7. ^, ^^, and **

Operators other than this have no precedence, and are still simply parsed from left to right. Or rather: the default for any operator is that it has higher precedence than any operator to its right in the expression, and lower precedence than any operator to its left.

Examples:

  • true || false && true => true || (false && true)
  • 1 + 2 >= 3 + 4 => (1 + 2) >= (3 + 4)
  • 9 % 2 == 0 => (9 % 2) == 0
  • 0 == 9 % 2 => 0 == (9 % 2)
  • 2 * 10 $ 20 => (2 * 10) $ 20 ($ has no precedence rule)
  • 1 * 2 $ 3 * 4 $ 5 => (((1 * 2) $ 3) * 4) $ 5

Fixes #895, #3467, #3468

Implementation notes

When parsing an expression with an infix ("symboly") function application, the parser proceeds as follows:

  1. The expression is parsed left-associated (the old way).
  2. Starting at the leftmost infix application subexpression, see if the next operator has higher precedence. If so we "rotate" the expression to the right. E.g. in in a + b * c, we first parse (a + b) * c then rotate the parse tree to a + (b * c).
  3. Perform step 2 on the right-hand side of the expression after rotation, if necessary. For example, b above might itself be an infix application with lower precedence than *.
  4. Proceed to the next operator to the right in the original expression and repeat steps 2-4 until we reach the end.

When printing, the term printer prints a whole chain of infix operator applications without parentheses, e.g. 1 + 2 * 3 > 4 * 5 only if either:

  • The precedence of the operator in the subexpression is higher than the parent expression.
  • The operator in the subexpression has no precedence rule, appears in a left subexpression, and has the same unqualified name as the operator in the parent expression.

Otherwise parentheses are added around subexpressions.

This PR also changes the way precedence is implemented in the term printer. Instead of a number from -2 to 12, it's now a data type Precedence with a total order:

data Precedence
  = -- | The lowest precedence, used for top-level bindings
    Basement
  | -- | Used for terms that never need parentheses
    Bottom
  | -- | Type annotations
    Annotation
  | -- | A statement in a block
    Statement
  | -- | Control flow constructs like `if`, `match`, `case`
    Control
  | -- | Infix operators
    InfixOp InfixPrecedence
  | -- | Function application
    Application
  | -- | Prefix operators like `'`, `!`
    Prefix
  | -- | The highest precedence, used for let bindings and blocks
    Top
  deriving (Eq, Ord, Show)

data InfixPrecedence = Lowest | Level Int | Highest
  deriving (Eq, Ord, Show)

Interesting/controversial decisions

  • We print parentheses around infix applications when the operator has no precedence and doesn't have the same name as the operator to its right. We could omit parentheses here, but I thought code would be a bit more readable if the parens are present.
  • The list of operator names is entirely arbitrary, and there's an argument to be made that there are too many or too few operators in the list.
  • Equality has higher precedence than logical operators, following the example of Scala and Haskell, but this is potentially confusing for e.g. true || false == true which parses as true || (false == true).

Loose ends

There are no associativity rules included here, so all operators associate to the left as before.

Test coverage

Updated the roundtrip test transcripts to include a bunch of expressions with a mix of infix operators, to make sure the parser and printer agree.

@runarorama runarorama changed the title 1Added precedence rules to term parser Added precedence rules to term parser Aug 11, 2024
@runarorama runarorama changed the title Added precedence rules to term parser Add precedence rules to term parser Aug 11, 2024
@runarorama runarorama changed the title Add precedence rules to term parser Add precedence rules Aug 11, 2024
@runarorama runarorama changed the title Add precedence rules Add infix operator precedence rules Aug 20, 2024
@runarorama runarorama marked this pull request as ready for review August 20, 2024 17:54
@ChrisPenner
Copy link
Contributor

ChrisPenner commented Aug 20, 2024

My one concern is with && and ||; I get pretty easily confused with boolean expressions and always over-eagerly parenthesize them.

I think it would hurt readability if my carefully crafted (a || b) && (c || d) got reprinted into a || b && c || d, would putting those into the same precedence bucket to force parens on printing?

Or perhaps we can change the rules so it parses with the current precedence but still parenthesizes when printing just to avoid confusion?

TBH I think if I were designing just for me my preference would be to just throw a parse error Please add parens to disambiguate precedence of &&and|| in the expression: ... to ensure we parse things the way the user wanted

@sellout
Copy link
Contributor

sellout commented Aug 20, 2024

Operators other than this have no precedence, and are still simply parsed from left to right. Or rather: the default for any operator is that it has higher precedence than any operator to its right in the expression, and lower precedence than any operator to its left.

I’m trying to figure out if this is more intuitive from the user’s perspective than just giving un-precedenced operators the lowest precedence (one lower than the lowest precedenced op).

E.g., if I write the expression 3 + 4 ∘ 5 * 6, the “asymmetric” precedence rules would parse that into (3 + 4) ∘ 5) * 6, correct? And then it would be pretty-printed as (3 + 4 ∘ 5) * 6.

But with “lowest” precedence rules, 3 + 4 ∘ 5 * 6 would be parsed as (3 + 4) ∘ (5 * 6) and pretty-print back to 3 + 4 ∘ 5 * 6.

To get both to parse to (3 + 4) ∘ (5 * 6), I would need to write 3 + 4 ∘ (5 * 6) and then the asymmetric rules would also pretty-print back the input (3 + 4 ∘ (5 * 6)).

I feel like I have to think pretty hard about the asymmetric version of this, rather than “other operators have precedence -1”.

With lowest precedence, we still get a `o1` b `o2` c `o3` d parsing as ((a `o1` b) `o2` c) `o3` d, so I can’t think of what is gained by “the default for any operator is that it has higher precedence than any operator to its right in the expression, and lower precedence than any operator to its left.”

When printing, the term printer prints a whole chain of infix operator applications without parentheses, e.g. 1 + 2 * 3 > 4 * 5 only if either:

* The precedence of the operator in the subexpression is higher than the parent expression.

Should this be more like “The precedence of the operator in the subexpression is at least as high (if it’s on the left) or higher (if it’s on the right) than the parent expression.”? Granted, my phrasing is awkward, but in the current description, it implies that 3 + 4 + 5 would be pretty-printed as (3 + 4) + 5.

* Equality has higher precedence than logical operators, following the example of Scala and Haskell, but this is potentially confusing for e.g. `true || false == true` which parses as `true || (false == true)`.

I agree with this decision. Logically combining a lot of comparisons is a common pattern, and I think people expect this precedence.

@runarorama
Copy link
Contributor Author

Should this be more like “The precedence of the operator in the subexpression is at least as high (if it’s on the left) or higher (if it’s on the right) than the parent expression.”? Granted, my phrasing is awkward, but in the current description, it implies that 3 + 4 + 5 would be pretty-printed as (3 + 4) + 5.

+ has a precedence rule, so it will always be chained with other operators that have a precedence rule. But yes, the rule is as you describe. Whether it's > or >= depends on whether the subexpression appears on the left or the right.

@sellout
Copy link
Contributor

sellout commented Aug 20, 2024

My one concern is with && and ||; I get pretty easily confused with boolean expressions and always over-eagerly parenthesize them.

Yeah, I had this same thought, but I think making them the same precedence and not changing anything else would result in a worse situation (I think the pretty printer would print (a || b) && (c || d) as a || b && (c || d) and the unparenthesized printing would mean ((a || b) && c) || d).

We could apply the “extra parens” rules for un-precedenced operators to these as well, and get the more explicit groupings. However, they’re already the lowest precedence, so if the handling of un-precedenced operators is handled as I suggest in my previous comment, we could just remove the precedence for logical operators and get the behavior you want.

As for how they interact with the other operators with those precedences, I assume & and | are bitwise operators (although I couldn’t find any immediate proof on Share). In which case, they should have higher precedence than logical operators anyway (and probably also higher than comparison operators).

@sellout
Copy link
Contributor

sellout commented Aug 20, 2024

2. `==`, `!==`, `!=`, and `===`
3. `<`, `>`, `>=`, and `<=`

I’m a little surprised these are separate, but I think it makes sense. You’re unlikely to have an expression like
x < y == w <= z, but when you do, you’re likely to compare the booleans with equality rather than other comparison operators.

@sellout
Copy link
Contributor

sellout commented Aug 20, 2024

I also wonder if ++ should get a place somewhere above equality as well.

import Unison.Prelude

-- Precedence rules for infix operators.
-- Lower number means higher precedence (tighter binding).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn’t match the implementation – higher is higher (which is good).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this is a defunct comment

Top
deriving (Eq, Ord, Show)

data InfixPrecedence = Lowest | Level Int | Highest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are Lowest and Highest only needed for pretty-printing the asymmetric semantics of un-precedenced operators?

If we switch to just giving them minimal precedence, can this become

data InfixPrecedence = Word8

If we still need some bounds-checking of infix precedences, could we get by with something like this?

lowestInfix, highestInfix :: Precedence
lowestInfix = InfixOp minBound
highestInfix = InfixOp maxBound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pretty-printer was kind of a rat's nest of hacks around precedences, where in certain cases it was using a precedence in between "infix application" and "function application", as well as between infix and "control structure". I think we could probably get away with just making this a word8, but doing it this way allowed me to leave the implementation largely unchanged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had that feeling as well. Seems reasonable to defer any simplification / cleanup.

Annotation -> Statement
Statement -> Control
Control -> InfixOp Lowest
InfixOp Lowest -> InfixOp (Level 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like Level Word (or even Word8) would be an improvement for the type, so that this case is definitely true.

Control -> InfixOp Lowest
InfixOp Lowest -> InfixOp (Level 0)
InfixOp (Level n) -> InfixOp (Level (n + 1))
InfixOp Highest -> Application
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a comment on this function that there’s a disconnect between InfixOp (Level n) and InfixOp Highest – or, (especially if the type is changed to Level Word8) have a case like

InfixOp (Level 255) -> InfixOp Highest

Copy link
Contributor

@ChrisPenner ChrisPenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏼

Copy link
Member

@pchiusano pchiusano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!! Appreciate all the round trip tests as well.

My take: let's merge this if @runarorama is happy with it and dogfood it a bit. We can always make more tweaks, nbd.

I might prefer that mixing && and || gets parens in the pretty-printer, even if parsing has && higher precedence than ||. But it's probably fine as and people will expect that precedence. A tooltip on && and || that mentions this relative precedence could help as well, cc @rlmark @hojberg.

One cool idea I had for later is that the pretty-printer could produce grouping information. Which is ignored when converting to text, but the HTML renderer could make use of it to allow highlighting of subexpressions and "expand selection to parent".

@runarorama runarorama merged commit e9ca76f into trunk Aug 24, 2024
35 checks passed
@runarorama runarorama deleted the precedence branch August 24, 2024 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operator precedence for && and || doesn't work
4 participants