Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine LR1 items with same productions #7

Open
osa1 opened this issue Jul 26, 2022 · 0 comments
Open

Combine LR1 items with same productions #7

osa1 opened this issue Jul 26, 2022 · 0 comments

Comments

@osa1
Copy link
Owner

osa1 commented Jul 26, 2022

LR1 items:

parsegen/src/lr1.rs

Lines 10 to 17 in f368fac

#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone)]
struct LR1Item {
non_terminal_idx: NonTerminalIdx,
production_idx: ProductionIdx,
cursor: usize,
// None => EOF
lookahead: Option<TerminalIdx>,
}

LR1 sets:

parsegen/src/lr1.rs

Lines 80 to 84 in f368fac

fn compute_lr1_closure<A>(
grammar: &Grammar<A>,
first_table: &FirstTable,
items: FxHashSet<LR1Item>,
) -> BTreeSet<LR1Item> {

This representation generates a lot of redundant (duplicate) info in states:

1571: {
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "("]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "bool"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "-"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "-."]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "+"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "+."]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "*."]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "/."]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "="]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "<>"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "<="]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "<"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ">="]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ">"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ","]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, ";"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "id"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "int"]
  [LetExpr -> "let" "(" Binder CommaBinder1_Rev ")" "=" SeqExpr "in" SeqExpr|, "float"]
}

All of the items in this state have the same production. We can combine all of these items into a single one with a lookahead set:

struct LR1Item {
    non_terminal_idx: NonTerminalIdx,
    production_idx: ProductionIdx,
    cursor: usize,
    lookahead: FxHashSet<TerminalIdx>,
}

We can either add one more field for whether EOF is a valid lookahead, or we can assign EOF a TerminalIdx.

See #1 for optimizing terminal sets. (FxHashSet<TerminalIdx> in the code above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant