Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalized advanced queries: Analyze C++ implementation #1046

Open
Tracked by #1041
fengelniederhammer opened this issue Jan 8, 2025 · 2 comments
Open
Tracked by #1041

Generalized advanced queries: Analyze C++ implementation #1046

fengelniederhammer opened this issue Jan 8, 2025 · 2 comments
Assignees

Comments

@fengelniederhammer
Copy link
Contributor

fengelniederhammer commented Jan 8, 2025

Think about whether we can implement the generalized advanced queries directly in SILO. LAPIS would then only forward the query string and SILO could parse it.

  • Are there parsers in C++ that are easy enough to use? SQL parsers? General parsers like Antlr?
  • Which options do we have?
  • How long would it take to implement it (to compare to implementation in LAPIS)?
@fengelniederhammer fengelniederhammer changed the title Analyze C++ implementation Generalized advanced queries: Analyze C++ implementation Jan 8, 2025
@fengelniederhammer
Copy link
Contributor Author

I tried https://github.com/pganalyze/libpg_query.

I checked out the repo locally and modified the "simple" example.

My modified version of the simple example
// Welcome to the easiest way to parse an SQL query :-)
// Compile the file like this:
//
// cc -I../ -L../ simple.c -lpg_query

#include <pg_query.h>
#include <stdio.h>
#include <stdlib.h>

size_t testCount = 6;
const char* tests[] = {
  "select * from x where country = 'Germany'\n"
  "  AND date BETWEEN '2024-07-08' AND '2025-01-06'\n"
  "  AND (\n"
  "    mutation = 'S:190T'\n"
  "    OR mutation = 'S:190S'\n"
  "  );",
  "SELECT * FROM x WHERE date BETWEEN '2024-07-08' AND '2025-01-06'\n"
  "  AND nextstrainClade <> '21K'\n"
  "  AND NOT (\n"
  "    mutation = 'S:494-' OR\n"
  "    mutation = 'S:490-'\n"
  "  )",
  "SELECT * FROM x WHERE n_of(3, 'at_least', mutation = 'S:346', mutation = 'S:356', mutation = 'S:444', mutation = 'S:445')",
  "SELECT * FROM x WHERE mutation = 'S:408S' AND maybe(mutation = 'S:213G' AND mutation = 'S:346K')",
  "SELECT * FROM x WHERE division ~ '^Aargau$'"
};

int main() {
  PgQueryParseResult result;
  size_t i;

  for (i = 0; i < testCount; i++) {
    result = pg_query_parse(tests[i]);

    printf("%s\n", tests[i]);
    printf("\n");
    printf("\n");
    if (result.error) {
      printf("error: %s at %d\n", result.error->message, result.error->cursorpos);
    } else {
      printf("%s\n", result.parse_tree);
    }
    printf("\n");
    printf("---------------------------------------------");
    printf("\n");

    pg_query_free_parse_result(result);
  }

  // Optional, this ensures all memory is freed upon program exit (useful when running Valgrind)
  pg_query_exit();

  return 0;
}

It was able to parse these queries:

Generic query with metadata and mutations

select * from x where country = 'Germany'
  AND date BETWEEN '2024-07-08' AND '2025-01-06'
  AND (
    mutation = 'S:190T'
    OR mutation = 'S:190S'
  );
Result
{
  "version": 170000,
  "stmts": [
    {
      "stmt": {
        "SelectStmt": {
          "targetList": [
            {
              "ResTarget": {
                "val": {
                  "ColumnRef": {
                    "fields": [
                      {
                        "A_Star": {}
                      }
                    ],
                    "location": 7
                  }
                },
                "location": 7
              }
            }
          ],
          "fromClause": [
            {
              "RangeVar": {
                "relname": "x",
                "inh": true,
                "relpersistence": "p",
                "location": 14
              }
            }
          ],
          "whereClause": {
            "BoolExpr": {
              "boolop": "AND_EXPR",
              "args": [
                {
                  "A_Expr": {
                    "kind": "AEXPR_OP",
                    "name": [
                      {
                        "String": {
                          "sval": "="
                        }
                      }
                    ],
                    "lexpr": {
                      "ColumnRef": {
                        "fields": [
                          {
                            "String": {
                              "sval": "country"
                            }
                          }
                        ],
                        "location": 22
                      }
                    },
                    "rexpr": {
                      "A_Const": {
                        "sval": {
                          "sval": "Germany"
                        },
                        "location": 32
                      }
                    },
                    "location": 30
                  }
                },
                {
                  "A_Expr": {
                    "kind": "AEXPR_BETWEEN",
                    "name": [
                      {
                        "String": {
                          "sval": "BETWEEN"
                        }
                      }
                    ],
                    "lexpr": {
                      "ColumnRef": {
                        "fields": [
                          {
                            "String": {
                              "sval": "date"
                            }
                          }
                        ],
                        "location": 48
                      }
                    },
                    "rexpr": {
                      "List": {
                        "items": [
                          {
                            "A_Const": {
                              "sval": {
                                "sval": "2024-07-08"
                              },
                              "location": 61
                            }
                          },
                          {
                            "A_Const": {
                              "sval": {
                                "sval": "2025-01-06"
                              },
                              "location": 78
                            }
                          }
                        ]
                      }
                    },
                    "location": 53
                  }
                },
                {
                  "BoolExpr": {
                    "boolop": "OR_EXPR",
                    "args": [
                      {
                        "A_Expr": {
                          "kind": "AEXPR_OP",
                          "name": [
                            {
                              "String": {
                                "sval": "="
                              }
                            }
                          ],
                          "lexpr": {
                            "ColumnRef": {
                              "fields": [
                                {
                                  "String": {
                                    "sval": "mutation"
                                  }
                                }
                              ],
                              "location": 103
                            }
                          },
                          "rexpr": {
                            "A_Const": {
                              "sval": {
                                "sval": "S:190T"
                              },
                              "location": 114
                            }
                          },
                          "location": 112
                        }
                      },
                      {
                        "A_Expr": {
                          "kind": "AEXPR_OP",
                          "name": [
                            {
                              "String": {
                                "sval": "="
                              }
                            }
                          ],
                          "lexpr": {
                            "ColumnRef": {
                              "fields": [
                                {
                                  "String": {
                                    "sval": "mutation"
                                  }
                                }
                              ],
                              "location": 130
                            }
                          },
                          "rexpr": {
                            "A_Const": {
                              "sval": {
                                "sval": "S:190S"
                              },
                              "location": 141
                            }
                          },
                          "location": 139
                        }
                      }
                    ],
                    "location": 127
                  }
                }
              ],
              "location": 44
            }
          },
          "limitOption": "LIMIT_OPTION_DEFAULT",
          "op": "SETOP_NONE"
        }
      },
      "stmt_len": 153
    }
  ]
}

Another generic query with "not equal"

SELECT * FROM x WHERE date BETWEEN '2024-07-08' AND '2025-01-06'
  AND nextstrainClade <> '21K'
  AND NOT (
    mutation = 'S:494-' OR
    mutation = 'S:490-'
  )
Result
{
  "version": 170000,
  "stmts": [
    {
      "stmt": {
        "SelectStmt": {
          "targetList": [
            {
              "ResTarget": {
                "val": {
                  "ColumnRef": {
                    "fields": [
                      {
                        "A_Star": {}
                      }
                    ],
                    "location": 7
                  }
                },
                "location": 7
              }
            }
          ],
          "fromClause": [
            {
              "RangeVar": {
                "relname": "x",
                "inh": true,
                "relpersistence": "p",
                "location": 14
              }
            }
          ],
          "whereClause": {
            "BoolExpr": {
              "boolop": "AND_EXPR",
              "args": [
                {
                  "A_Expr": {
                    "kind": "AEXPR_BETWEEN",
                    "name": [
                      {
                        "String": {
                          "sval": "BETWEEN"
                        }
                      }
                    ],
                    "lexpr": {
                      "ColumnRef": {
                        "fields": [
                          {
                            "String": {
                              "sval": "date"
                            }
                          }
                        ],
                        "location": 22
                      }
                    },
                    "rexpr": {
                      "List": {
                        "items": [
                          {
                            "A_Const": {
                              "sval": {
                                "sval": "2024-07-08"
                              },
                              "location": 35
                            }
                          },
                          {
                            "A_Const": {
                              "sval": {
                                "sval": "2025-01-06"
                              },
                              "location": 52
                            }
                          }
                        ]
                      }
                    },
                    "location": 27
                  }
                },
                {
                  "A_Expr": {
                    "kind": "AEXPR_OP",
                    "name": [
                      {
                        "String": {
                          "sval": "\u003c\u003e"
                        }
                      }
                    ],
                    "lexpr": {
                      "ColumnRef": {
                        "fields": [
                          {
                            "String": {
                              "sval": "nextstrainclade"
                            }
                          }
                        ],
                        "location": 71
                      }
                    },
                    "rexpr": {
                      "A_Const": {
                        "sval": {
                          "sval": "21K"
                        },
                        "location": 90
                      }
                    },
                    "location": 87
                  }
                },
                {
                  "BoolExpr": {
                    "boolop": "NOT_EXPR",
                    "args": [
                      {
                        "BoolExpr": {
                          "boolop": "OR_EXPR",
                          "args": [
                            {
                              "A_Expr": {
                                "kind": "AEXPR_OP",
                                "name": [
                                  {
                                    "String": {
                                      "sval": "="
                                    }
                                  }
                                ],
                                "lexpr": {
                                  "ColumnRef": {
                                    "fields": [
                                      {
                                        "String": {
                                          "sval": "mutation"
                                        }
                                      }
                                    ],
                                    "location": 112
                                  }
                                },
                                "rexpr": {
                                  "A_Const": {
                                    "sval": {
                                      "sval": "S:494-"
                                    },
                                    "location": 123
                                  }
                                },
                                "location": 121
                              }
                            },
                            {
                              "A_Expr": {
                                "kind": "AEXPR_OP",
                                "name": [
                                  {
                                    "String": {
                                      "sval": "="
                                    }
                                  }
                                ],
                                "lexpr": {
                                  "ColumnRef": {
                                    "fields": [
                                      {
                                        "String": {
                                          "sval": "mutation"
                                        }
                                      }
                                    ],
                                    "location": 139
                                  }
                                },
                                "rexpr": {
                                  "A_Const": {
                                    "sval": {
                                      "sval": "S:490-"
                                    },
                                    "location": 150
                                  }
                                },
                                "location": 148
                              }
                            }
                          ],
                          "location": 132
                        }
                      }
                    ],
                    "location": 102
                  }
                }
              ],
              "location": 67
            }
          },
          "limitOption": "LIMIT_OPTION_DEFAULT",
          "op": "SETOP_NONE"
        }
      }
    }
  ]
}

N-Of

SELECT * FROM x WHERE n_of(3, 'at_least', mutation = 'S:346', mutation = 'S:356', mutation = 'S:444', mutation = 'S:445')
Result
{
  "version": 170000,
  "stmts": [
    {
      "stmt": {
        "SelectStmt": {
          "targetList": [
            {
              "ResTarget": {
                "val": {
                  "ColumnRef": {
                    "fields": [
                      {
                        "A_Star": {}
                      }
                    ],
                    "location": 7
                  }
                },
                "location": 7
              }
            }
          ],
          "fromClause": [
            {
              "RangeVar": {
                "relname": "x",
                "inh": true,
                "relpersistence": "p",
                "location": 14
              }
            }
          ],
          "whereClause": {
            "FuncCall": {
              "funcname": [
                {
                  "String": {
                    "sval": "n_of"
                  }
                }
              ],
              "args": [
                {
                  "A_Const": {
                    "ival": {
                      "ival": 3
                    },
                    "location": 27
                  }
                },
                {
                  "A_Const": {
                    "sval": {
                      "sval": "at_least"
                    },
                    "location": 30
                  }
                },
                {
                  "A_Expr": {
                    "kind": "AEXPR_OP",
                    "name": [
                      {
                        "String": {
                          "sval": "="
                        }
                      }
                    ],
                    "lexpr": {
                      "ColumnRef": {
                        "fields": [
                          {
                            "String": {
                              "sval": "mutation"
                            }
                          }
                        ],
                        "location": 42
                      }
                    },
                    "rexpr": {
                      "A_Const": {
                        "sval": {
                          "sval": "S:346"
                        },
                        "location": 53
                      }
                    },
                    "location": 51
                  }
                },
                {
                  "A_Expr": {
                    "kind": "AEXPR_OP",
                    "name": [
                      {
                        "String": {
                          "sval": "="
                        }
                      }
                    ],
                    "lexpr": {
                      "ColumnRef": {
                        "fields": [
                          {
                            "String": {
                              "sval": "mutation"
                            }
                          }
                        ],
                        "location": 62
                      }
                    },
                    "rexpr": {
                      "A_Const": {
                        "sval": {
                          "sval": "S:356"
                        },
                        "location": 73
                      }
                    },
                    "location": 71
                  }
                },
                {
                  "A_Expr": {
                    "kind": "AEXPR_OP",
                    "name": [
                      {
                        "String": {
                          "sval": "="
                        }
                      }
                    ],
                    "lexpr": {
                      "ColumnRef": {
                        "fields": [
                          {
                            "String": {
                              "sval": "mutation"
                            }
                          }
                        ],
                        "location": 82
                      }
                    },
                    "rexpr": {
                      "A_Const": {
                        "sval": {
                          "sval": "S:444"
                        },
                        "location": 93
                      }
                    },
                    "location": 91
                  }
                },
                {
                  "A_Expr": {
                    "kind": "AEXPR_OP",
                    "name": [
                      {
                        "String": {
                          "sval": "="
                        }
                      }
                    ],
                    "lexpr": {
                      "ColumnRef": {
                        "fields": [
                          {
                            "String": {
                              "sval": "mutation"
                            }
                          }
                        ],
                        "location": 102
                      }
                    },
                    "rexpr": {
                      "A_Const": {
                        "sval": {
                          "sval": "S:445"
                        },
                        "location": 113
                      }
                    },
                    "location": 111
                  }
                }
              ],
              "funcformat": "COERCE_EXPLICIT_CALL",
              "location": 22
            }
          },
          "limitOption": "LIMIT_OPTION_DEFAULT",
          "op": "SETOP_NONE"
        }
      }
    }
  ]
}

Maybe

SELECT * FROM x WHERE mutation = 'S:408S' AND maybe(mutation = 'S:213G' AND mutation = 'S:346K')
Result
{
  "version": 170000,
  "stmts": [
    {
      "stmt": {
        "SelectStmt": {
          "targetList": [
            {
              "ResTarget": {
                "val": {
                  "ColumnRef": {
                    "fields": [
                      {
                        "A_Star": {}
                      }
                    ],
                    "location": 7
                  }
                },
                "location": 7
              }
            }
          ],
          "fromClause": [
            {
              "RangeVar": {
                "relname": "x",
                "inh": true,
                "relpersistence": "p",
                "location": 14
              }
            }
          ],
          "whereClause": {
            "BoolExpr": {
              "boolop": "AND_EXPR",
              "args": [
                {
                  "A_Expr": {
                    "kind": "AEXPR_OP",
                    "name": [
                      {
                        "String": {
                          "sval": "="
                        }
                      }
                    ],
                    "lexpr": {
                      "ColumnRef": {
                        "fields": [
                          {
                            "String": {
                              "sval": "mutation"
                            }
                          }
                        ],
                        "location": 22
                      }
                    },
                    "rexpr": {
                      "A_Const": {
                        "sval": {
                          "sval": "S:408S"
                        },
                        "location": 33
                      }
                    },
                    "location": 31
                  }
                },
                {
                  "FuncCall": {
                    "funcname": [
                      {
                        "String": {
                          "sval": "maybe"
                        }
                      }
                    ],
                    "args": [
                      {
                        "BoolExpr": {
                          "boolop": "AND_EXPR",
                          "args": [
                            {
                              "A_Expr": {
                                "kind": "AEXPR_OP",
                                "name": [
                                  {
                                    "String": {
                                      "sval": "="
                                    }
                                  }
                                ],
                                "lexpr": {
                                  "ColumnRef": {
                                    "fields": [
                                      {
                                        "String": {
                                          "sval": "mutation"
                                        }
                                      }
                                    ],
                                    "location": 52
                                  }
                                },
                                "rexpr": {
                                  "A_Const": {
                                    "sval": {
                                      "sval": "S:213G"
                                    },
                                    "location": 63
                                  }
                                },
                                "location": 61
                              }
                            },
                            {
                              "A_Expr": {
                                "kind": "AEXPR_OP",
                                "name": [
                                  {
                                    "String": {
                                      "sval": "="
                                    }
                                  }
                                ],
                                "lexpr": {
                                  "ColumnRef": {
                                    "fields": [
                                      {
                                        "String": {
                                          "sval": "mutation"
                                        }
                                      }
                                    ],
                                    "location": 76
                                  }
                                },
                                "rexpr": {
                                  "A_Const": {
                                    "sval": {
                                      "sval": "S:346K"
                                    },
                                    "location": 87
                                  }
                                },
                                "location": 85
                              }
                            }
                          ],
                          "location": 72
                        }
                      }
                    ],
                    "funcformat": "COERCE_EXPLICIT_CALL",
                    "location": 46
                  }
                }
              ],
              "location": 42
            }
          },
          "limitOption": "LIMIT_OPTION_DEFAULT",
          "op": "SETOP_NONE"
        }
      }
    }
  ]
}

String search (Regex)

SELECT * FROM x WHERE division ~ '^Aargau$'
Result
{
  "version": 170000,
  "stmts": [
    {
      "stmt": {
        "SelectStmt": {
          "targetList": [
            {
              "ResTarget": {
                "val": {
                  "ColumnRef": {
                    "fields": [
                      {
                        "A_Star": {}
                      }
                    ],
                    "location": 7
                  }
                },
                "location": 7
              }
            }
          ],
          "fromClause": [
            {
              "RangeVar": {
                "relname": "x",
                "inh": true,
                "relpersistence": "p",
                "location": 14
              }
            }
          ],
          "whereClause": {
            "A_Expr": {
              "kind": "AEXPR_OP",
              "name": [
                {
                  "String": {
                    "sval": "~"
                  }
                }
              ],
              "lexpr": {
                "ColumnRef": {
                  "fields": [
                    {
                      "String": {
                        "sval": "division"
                      }
                    }
                  ],
                  "location": 22
                }
              },
              "rexpr": {
                "A_Const": {
                  "sval": {
                    "sval": "^Aargau$"
                  },
                  "location": 33
                }
              },
              "location": 31
            }
          },
          "limitOption": "LIMIT_OPTION_DEFAULT",
          "op": "SETOP_NONE"
        }
      }
    }
  ]
}

@fengelniederhammer
Copy link
Contributor Author

The question remains: How would we install pg_query as a dependency? There is no Conan recipe.

@fengelniederhammer fengelniederhammer self-assigned this Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant