Add infix operator precedence rules #5273

runarorama · 2024-08-11T02:31:45Z

Overview

This adds precedence rules for infix operators, so e.g. 7 == 1 + 2 * 3 will be parsed as 7 == (1 + (2 * 3)) instead of ((7 == 1) + 2) * 3. The term printer will also print these back without parentheses.

Previously infix operators would always parse left-associated with no precedence.

This change adds hard-coded rules for infix operators based on their unqualified names. The precedence levels are as follows (higher precedence binds tighter):

|| and |
&& and &
==, !==, !=, and ===
<, >, >=, and <=
+ and -
*, /, and %
^, ^^, and **

Operators other than this have no precedence, and are still simply parsed from left to right. Or rather: the default for any operator is that it has higher precedence than any operator to its right in the expression, and lower precedence than any operator to its left.

Examples:

true || false && true => true || (false && true)
1 + 2 >= 3 + 4 => (1 + 2) >= (3 + 4)
9 % 2 == 0 => (9 % 2) == 0
0 == 9 % 2 => 0 == (9 % 2)
2 * 10 $ 20 => (2 * 10) $ 20 ($ has no precedence rule)
1 * 2 $ 3 * 4 $ 5 => (((1 * 2) $ 3) * 4) $ 5

Fixes #895, #3467, #3468

Implementation notes

When parsing an expression with an infix ("symboly") function application, the parser proceeds as follows:

The expression is parsed left-associated (the old way).
Starting at the leftmost infix application subexpression, see if the next operator has higher precedence. If so we "rotate" the expression to the right. E.g. in in a + b * c, we first parse (a + b) * c then rotate the parse tree to a + (b * c).
Perform step 2 on the right-hand side of the expression after rotation, if necessary. For example, b above might itself be an infix application with lower precedence than *.
Proceed to the next operator to the right in the original expression and repeat steps 2-4 until we reach the end.

When printing, the term printer prints a whole chain of infix operator applications without parentheses, e.g. 1 + 2 * 3 > 4 * 5 only if either:

The precedence of the operator in the subexpression is higher than the parent expression.
The operator in the subexpression has no precedence rule, appears in a left subexpression, and has the same unqualified name as the operator in the parent expression.

Otherwise parentheses are added around subexpressions.

This PR also changes the way precedence is implemented in the term printer. Instead of a number from -2 to 12, it's now a data type Precedence with a total order:

data Precedence
  = -- | The lowest precedence, used for top-level bindings
    Basement
  | -- | Used for terms that never need parentheses
    Bottom
  | -- | Type annotations
    Annotation
  | -- | A statement in a block
    Statement
  | -- | Control flow constructs like `if`, `match`, `case`
    Control
  | -- | Infix operators
    InfixOp InfixPrecedence
  | -- | Function application
    Application
  | -- | Prefix operators like `'`, `!`
    Prefix
  | -- | The highest precedence, used for let bindings and blocks
    Top
  deriving (Eq, Ord, Show)

data InfixPrecedence = Lowest | Level Int | Highest
  deriving (Eq, Ord, Show)

Interesting/controversial decisions

We print parentheses around infix applications when the operator has no precedence and doesn't have the same name as the operator to its right. We could omit parentheses here, but I thought code would be a bit more readable if the parens are present.
The list of operator names is entirely arbitrary, and there's an argument to be made that there are too many or too few operators in the list.
Equality has higher precedence than logical operators, following the example of Scala and Haskell, but this is potentially confusing for e.g. true || false == true which parses as true || (false == true).

Loose ends

There are no associativity rules included here, so all operators associate to the left as before.

Test coverage

Updated the roundtrip test transcripts to include a bunch of expressions with a mix of infix operators, to make sure the parser and printer agree.

ChrisPenner · 2024-08-20T18:40:46Z

My one concern is with && and ||; I get pretty easily confused with boolean expressions and always over-eagerly parenthesize them.

I think it would hurt readability if my carefully crafted (a || b) && (c || d) got reprinted into a || b && c || d, would putting those into the same precedence bucket to force parens on printing?

Or perhaps we can change the rules so it parses with the current precedence but still parenthesizes when printing just to avoid confusion?

TBH I think if I were designing just for me my preference would be to just throw a parse error Please add parens to disambiguate precedence of &&and|| in the expression: ... to ensure we parse things the way the user wanted

…edence

sellout · 2024-08-20T20:05:45Z

Operators other than this have no precedence, and are still simply parsed from left to right. Or rather: the default for any operator is that it has higher precedence than any operator to its right in the expression, and lower precedence than any operator to its left.

I’m trying to figure out if this is more intuitive from the user’s perspective than just giving un-precedenced operators the lowest precedence (one lower than the lowest precedenced op).

E.g., if I write the expression 3 + 4 ∘ 5 * 6, the “asymmetric” precedence rules would parse that into (3 + 4) ∘ 5) * 6, correct? And then it would be pretty-printed as (3 + 4 ∘ 5) * 6.

But with “lowest” precedence rules, 3 + 4 ∘ 5 * 6 would be parsed as (3 + 4) ∘ (5 * 6) and pretty-print back to 3 + 4 ∘ 5 * 6.

To get both to parse to (3 + 4) ∘ (5 * 6), I would need to write 3 + 4 ∘ (5 * 6) and then the asymmetric rules would also pretty-print back the input (3 + 4 ∘ (5 * 6)).

I feel like I have to think pretty hard about the asymmetric version of this, rather than “other operators have precedence -1”.

With lowest precedence, we still get a `o1` b `o2` c `o3` d parsing as ((a `o1` b) `o2` c) `o3` d, so I can’t think of what is gained by “the default for any operator is that it has higher precedence than any operator to its right in the expression, and lower precedence than any operator to its left.”

When printing, the term printer prints a whole chain of infix operator applications without parentheses, e.g. 1 + 2 * 3 > 4 * 5 only if either:
* The precedence of the operator in the subexpression is higher than the parent expression.

Should this be more like “The precedence of the operator in the subexpression is at least as high (if it’s on the left) or higher (if it’s on the right) than the parent expression.”? Granted, my phrasing is awkward, but in the current description, it implies that 3 + 4 + 5 would be pretty-printed as (3 + 4) + 5.

* Equality has higher precedence than logical operators, following the example of Scala and Haskell, but this is potentially confusing for e.g. `true || false == true` which parses as `true || (false == true)`.

I agree with this decision. Logically combining a lot of comparisons is a common pattern, and I think people expect this precedence.

runarorama · 2024-08-20T20:16:36Z

Should this be more like “The precedence of the operator in the subexpression is at least as high (if it’s on the left) or higher (if it’s on the right) than the parent expression.”? Granted, my phrasing is awkward, but in the current description, it implies that 3 + 4 + 5 would be pretty-printed as (3 + 4) + 5.

+ has a precedence rule, so it will always be chained with other operators that have a precedence rule. But yes, the rule is as you describe. Whether it's > or >= depends on whether the subexpression appears on the left or the right.

sellout · 2024-08-20T20:24:00Z

My one concern is with && and ||; I get pretty easily confused with boolean expressions and always over-eagerly parenthesize them.

Yeah, I had this same thought, but I think making them the same precedence and not changing anything else would result in a worse situation (I think the pretty printer would print (a || b) && (c || d) as a || b && (c || d) and the unparenthesized printing would mean ((a || b) && c) || d).

We could apply the “extra parens” rules for un-precedenced operators to these as well, and get the more explicit groupings. However, they’re already the lowest precedence, so if the handling of un-precedenced operators is handled as I suggest in my previous comment, we could just remove the precedence for logical operators and get the behavior you want.

As for how they interact with the other operators with those precedences, I assume & and | are bitwise operators (although I couldn’t find any immediate proof on Share). In which case, they should have higher precedence than logical operators anyway (and probably also higher than comparison operators).

sellout · 2024-08-20T20:27:44Z

2. `==`, `!==`, `!=`, and `===`
3. `<`, `>`, `>=`, and `<=`

I’m a little surprised these are separate, but I think it makes sense. You’re unlikely to have an expression like
x < y == w <= z, but when you do, you’re likely to compare the booleans with equality rather than other comparison operators.

sellout · 2024-08-20T20:29:34Z

I also wonder if ++ should get a place somewhere above equality as well.

sellout · 2024-08-20T18:54:53Z

parser-typechecker/src/Unison/Syntax/Precedence.hs

+import Unison.Prelude
+
+-- Precedence rules for infix operators.
+-- Lower number means higher precedence (tighter binding).


This doesn’t match the implementation – higher is higher (which is good).

Good catch, this is a defunct comment

sellout · 2024-08-20T19:00:49Z

parser-typechecker/src/Unison/Syntax/Precedence.hs

+    Top
+  deriving (Eq, Ord, Show)
+
+data InfixPrecedence = Lowest | Level Int | Highest


Are Lowest and Highest only needed for pretty-printing the asymmetric semantics of un-precedenced operators?

If we switch to just giving them minimal precedence, can this become

data InfixPrecedence = Word8

If we still need some bounds-checking of infix precedences, could we get by with something like this?

lowestInfix, highestInfix :: Precedence lowestInfix = InfixOp minBound highestInfix = InfixOp maxBound

The pretty-printer was kind of a rat's nest of hacks around precedences, where in certain cases it was using a precedence in between "infix application" and "function application", as well as between infix and "control structure". I think we could probably get away with just making this a word8, but doing it this way allowed me to leave the implementation largely unchanged.

Yeah, I had that feeling as well. Seems reasonable to defer any simplification / cleanup.

sellout · 2024-08-20T19:25:35Z

parser-typechecker/src/Unison/Syntax/Precedence.hs

+  Annotation -> Statement
+  Statement -> Control
+  Control -> InfixOp Lowest
+  InfixOp Lowest -> InfixOp (Level 0)


I feel like Level Word (or even Word8) would be an improvement for the type, so that this case is definitely true.

sellout · 2024-08-20T19:30:01Z

parser-typechecker/src/Unison/Syntax/Precedence.hs

+  Control -> InfixOp Lowest
+  InfixOp Lowest -> InfixOp (Level 0)
+  InfixOp (Level n) -> InfixOp (Level (n + 1))
+  InfixOp Highest -> Application


Could use a comment on this function that there’s a disconnect between InfixOp (Level n) and InfixOp Highest – or, (especially if the type is changed to Level Word8) have a case like

InfixOp (Level 255) -> InfixOp Highest

ChrisPenner

LGTM 👍🏼

pchiusano

Nice!!! Appreciate all the round trip tests as well.

My take: let's merge this if @runarorama is happy with it and dogfood it a bit. We can always make more tweaks, nbd.

I might prefer that mixing && and || gets parens in the pretty-printer, even if parsing has && higher precedence than ||. But it's probably fine as and people will expect that precedence. A tooltip on && and || that mentions this relative precedence could help as well, cc @rlmark @hojberg.

One cool idea I had for later is that the pretty-printer could produce grouping information. Which is ignored when converting to text, but the HTML renderer could make use of it to allow highlighting of subexpressions and "expand selection to parent".

Added precedence rules to term parser

24b5aa5

runarorama changed the title ~~1Added precedence rules to term parser~~ Added precedence rules to term parser Aug 11, 2024

runarorama changed the title ~~Added precedence rules to term parser~~ Add precedence rules to term parser Aug 11, 2024

runarorama added 2 commits August 10, 2024 22:38

Cleanup

9d4e2eb

Add comments

8efd8d5

runarorama changed the title ~~Add precedence rules to term parser~~ Add precedence rules Aug 11, 2024

runarorama added 11 commits August 13, 2024 00:03

Proper algorithm for reassociation

7012cc4

Comments

8736c4a

Improve comments

b92dede

Modify infix printer

725ee3b

Adapt old infix printer to new rules

d43288a

Get rid of confusing precedence levels

1470085

Transcripts

2998137

transcripts

3e89640

More parens for do blocks

f48880f

Get rid of commented-out code

b64ac8b

Add roundtrip tests

3e40cb1

runarorama changed the title ~~Add precedence rules~~ Add infix operator precedence rules Aug 20, 2024

runarorama requested review from pchiusano, ChrisPenner and sellout August 20, 2024 17:54

runarorama marked this pull request as ready for review August 20, 2024 17:54

Add exponentiation operators

4c166f0

Merge branch 'trunk' of https://github.com/unisonweb/unison into prec…

da75484

…edence

sellout reviewed Aug 20, 2024

View reviewed changes

Simpler infix printer

c8414eb

ChrisPenner approved these changes Aug 21, 2024

View reviewed changes

pchiusano approved these changes Aug 21, 2024

View reviewed changes

sellout mentioned this pull request Aug 23, 2024

Either disable or properly support | in operators #1537

Open

runarorama added 2 commits August 23, 2024 15:51

Remove comments

eefacbf

More parens around do blocks

8a70414

runarorama merged commit e9ca76f into trunk Aug 24, 2024
35 checks passed

runarorama deleted the precedence branch August 24, 2024 01:21

ceedubs mentioned this pull request Oct 31, 2024

Arithmetic Precedence Is Wrong #3467

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add infix operator precedence rules #5273

Add infix operator precedence rules #5273

runarorama commented Aug 11, 2024 •

edited

Loading

ChrisPenner commented Aug 20, 2024 •

edited

Loading

sellout commented Aug 20, 2024

runarorama commented Aug 20, 2024

sellout commented Aug 20, 2024

sellout commented Aug 20, 2024

sellout commented Aug 20, 2024

sellout Aug 20, 2024

runarorama Aug 21, 2024

sellout Aug 20, 2024

runarorama Aug 21, 2024

sellout Aug 21, 2024

sellout Aug 20, 2024

sellout Aug 20, 2024

ChrisPenner left a comment

pchiusano left a comment

Add infix operator precedence rules #5273

Add infix operator precedence rules #5273

Conversation

runarorama commented Aug 11, 2024 • edited Loading

Overview

Implementation notes

Interesting/controversial decisions

Loose ends

Test coverage

ChrisPenner commented Aug 20, 2024 • edited Loading

sellout commented Aug 20, 2024

runarorama commented Aug 20, 2024

sellout commented Aug 20, 2024

sellout commented Aug 20, 2024

sellout commented Aug 20, 2024

sellout Aug 20, 2024

Choose a reason for hiding this comment

runarorama Aug 21, 2024

Choose a reason for hiding this comment

sellout Aug 20, 2024

Choose a reason for hiding this comment

runarorama Aug 21, 2024

Choose a reason for hiding this comment

sellout Aug 21, 2024

Choose a reason for hiding this comment

sellout Aug 20, 2024

Choose a reason for hiding this comment

sellout Aug 20, 2024

Choose a reason for hiding this comment

ChrisPenner left a comment

Choose a reason for hiding this comment

pchiusano left a comment

Choose a reason for hiding this comment

runarorama commented Aug 11, 2024 •

edited

Loading

ChrisPenner commented Aug 20, 2024 •

edited

Loading