AST-based SQLite driver #164

JanJakes · 2024-11-14T14:49:32Z

This PR is part of the Advanced MySQL support project and builds on the previous work in Exhaustive MySQL parser.

This is a WIP, currently stacked on top of #157.

JanJakes · 2024-11-15T14:34:51Z

@adamziel Looking deeper into this, I think that translating specific AST subnodes is pretty straightforward, but what's an open question is the driver/translator architecture, and I'm kind of going back and forth on multiple options.

Splitting the driver & translator
I'm not sure whether having all in one large class is a good idea, and I'm exploring ways to split the code a bit. That said, there is the large switch/case statement in the prototype that doesn't need to know much about the AST structure — it just handles some nodes specifically, and passes many other ones down to the translator again, recursively. This has some advantages, such as we don't need to worry much about where a specific grammar rule can appear.

Splitting top-level constructs
That said, some kind of distinction makes a lot of sense to me, at least at the highest level. There is a big difference whether we're running a SELECT statement, or an UPDATE statement, or CREATE TABLE, etc. These top-level statements have pretty different semantics, they can return different things, and some special constructs make sense only in some cases (e.g., SQL_CALC_FOUND_ROWS works only with SELECT statements). Therefore, I'm pretty confident that on the top level, it makes sense to split these use cases a bit into separate classes.

To add a bit more context, even though the top-level statement rule can be re-used in a compound statement for stored procedures, stored functions, triggers, and events, it's probably good to distinguish that case from the top-level use case. E.g., an ALTER TABLE ... statement on the top-level may result in directly executing multiple SQLite queries, while when the same is passed as a procedure body, we may need to say it's unsupported.

Splitting further?
However, going a little bit deeper, any kind of distinction can be problematic. There are surely some grammar rules that can appear only in the context of others — tableElementList can never appear in the SELECT clause, for instance — but would it make any sense to implement such rules in CREATE/UPDATE translator class? Or, should there be one huge switch to handle all the grammar rules, with maybe the exception of the top-level ones, as suggested above?

Translating vs. interpreting
Another way of looking at organizing the code is the difference between translating MySQL syntax into SQLite syntax vs. emulating a MySQL statement with multiple different SQLite statements or PHP calls. We can't completely separate these two, but I think it would be nice to be clear about where we are just translating, and where some logic is actually executed. Again, it seems to me that the top-level vs. deeper level could help here — each top-level statement could be handled by a "handler" class that would interpret the query in full, and these handlers could use a generic "translator" within them, whose task would only be to translate some MySQL constructs into SQLite syntax.

class WP_SQLite_Create_Table_Translator {
	public function translate_create_table( WP_Parser_Node $ast ) {
		// CREATE TABLE ... LIKE ...
		// CREATE TABLE ... (LIKE ...)
		$has_like = $ast->get_descendant_token( WP_MySQL_Lexer::LIKE_SYMBOL );
		if ($has_like) {
			// Here we will need to run a query to get the table structure,
			// and then create the table:
			return $this->create_table_like( $ast );
		}

		// CREATE TABLE ... ( elements ) options
		// CREATE TABLE ... AS ...
		$elements_ast = $ast->get_child( 'tableElementList' );
		$has_as = $ast->get_descendant_token( WP_MySQL_Lexer::AS_SYMBOL );
		if ( $elements_ast && $has_as ) {
			throw new \Exception('"CREATE TABLE" with both column/constraint definition and "AS" clause is not supported');
		}
		
		// SQLite supporst both ( elements ) and "AS", so we can just rewrite
		// the query and run it. Rewriting can be done either partly here,
		// or fully in the translator.
		$query = $this->translator->translate( $ast );
		return $this->connection->query( $query );

		// Or, we rewrite the top-level `CREATE TABLE` here and nested nodes in the translator:
		$query = new WP_SQLite_Expression(
			array_merge(
				array( WP_SQLite_Token_Factory::raw( 'CREATE TABLE' ) ),
				array(
					$this->translate(
						$ast->get_child( 'tableName' )->get_child()
					),
				),
				array( WP_SQLite_Token_Factory::raw( '(' ) ),
				$this->translator->translate( $ast->get_child( 'tableElementList' ) ),
				array( WP_SQLite_Token_Factory::raw( ')' ) )
			)
		);
		return $this->connection->query( $query );

I'm not yet sure what's the optimal solution, but it feels like at least a bit of a concern separation would be useful. It would also be nice if we could solidify some AST node translation by unit testing the translated fragment as well (not only by the end-to-end tests that are run against SQLite), as such test would give immediate feedback and great diffs.

adamziel · 2024-11-15T18:48:39Z

Is there any downside to starting with a single giant class and splitting once that no longer serves the purpose?

Separation of concerns is a useful principle but it cuts both ways. Take Enterprise FizzBuzz. Great separation of concerns, extremely unreadable codebase. Too many concerns were defined and separated.

Therefore, I propose starting with just a single concern: Running MySQL queries in SQLite. Why? Because we don't really need modular code here. I don't think we'll ever need to use, say, SELECT statement translation as a component in any other context – therefore is it really useful to separate it?

I only propose that approach as a starting point. Perhaps it will make things unnecessarily complex and we'll have to look for other ways to slice it. I'm just not too excited about preemptively decoupling things that may not need decoupling at all.

The HTML Tag processor, for example, is just a single class even though it tackles a complex problem. It's a huge advantage – most methods are private so we can adjust them without breaking BC, you never have to analyze a dependency graph or wonder where should a specific piece of logic go, you always have full access to the entire state, and you can extend it easily.

I've tried splitting that logic a few times in the current iteration of the plugin and the outcome was always messy. Making a distinction between translating and interpreting is tempting, but it complicates the design. Perhaps things are different now that we have AST – if yes, then we should hit the wall really soon with a single class approach.

JanJakes · 2024-11-18T09:20:12Z

@adamziel Thanks for your insights! I did not consider that separation of concerns also creates new public APIs as a side effect. That's a very important point. Your proposal makes a lot of sense to me. Somehow, it seemed natural to me that the HTML Tag Processor, or the MySQL lexer, for instance, is handled by a single class, but I wasn't sure about the driver. I guess for parser-like logic, a single huge class is quite common, but in the case of the driver, I started to see too many different things (DB connection, query translation, query execution, etc.), and got myself caught overthinking it.

Is there any downside to starting with a single giant class and splitting once that no longer serves the purpose?

The only downside is that the giant class can will have multiple responsibilities, and it can get a bit difficult to read. For instance, some state variables will make sense for selects, but not for inserts, etc. But we can mitigate that with good naming, docs, and making the state as simple as possible.

Separation of concerns is a useful principle but it cuts both ways. Take Enterprise FizzBuzz. Great separation of concerns, extremely unreadable codebase. Too many concerns were defined and separated.

Wow, what a codebase! 😂 And a nice demonstration of what over-architecting can cause.

Therefore, I propose starting with just a single concern: Running MySQL queries in SQLite.

💯

I only propose that approach as a starting point. Perhaps it will make things unnecessarily complex and we'll have to look for other ways to slice it. I'm just not too excited about preemptively decoupling things that may not need decoupling at all.

Makes sense!

It's a huge advantage – most methods are private so we can adjust them without breaking BC

This! 💯

Looking at the architecture, I'd like to simplify also the current draft with WP_SQLite_Expression and WP_SQLite_Query_Builder. I think maybe we could make it work without them.

The WP_SQLite_Query_Builder takes the expression and stringifies it, joining all strings with a space and wrapping sub-expressions with parentheses. It covers many use cases out of the box, but it's not really a definition of the SQLite SQL dialect or anything like that, and it's not very flexible — for instance, a list of column definitions is delimited by a comma, not a space, and maybe the current implementation would put spaces even into places where they aren't expected. Apart from that, it performs some operations on some tokens (quoting identifiers, for instance), which could be done by the driver itself.

For WP_SQLite_Expression, it may be possible to replace it within the driver with return types of the recursive translate calls — either we return a token, or we could return an array (~ the expression). In any case, if we'll need to keep this one for some reason, I'd consider renaming it to something like WP_SQLite_Node to avoid confusion with SQL Language Expressions or any hints that it maps to some SQLite construct.

I will quickly try out these simplifications to see if they could make sense.

Additionally, move the current SQLite driver prototype to a temporary class WP_SQLite_Driver_Prototype to be gradually moved to the new driver.

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

JanJakes · 2024-11-22T15:14:47Z

tests/parser/WP_Parser_Node_Tests.php

+		$this->assertSame( null, $root->get_descendant_token( 123 ) );
+
+		// Test multiple descendant methods.
+		// @TODO: Consider breadth-first search vs depth-first search.


@adamziel I initially kept the logic from the first prototype (which was traversing descendants breadth-first), and added this note to look into it later, but now it seems to me it may be more useful and natural to do a dept-first search (which would return tokens in the order from the original query).

I'm still not 100% sure which one we need more — usually, I just want to know "is there this node or that token somewhere in this subtree", and in that case, both searches work. But sometimes it may be useful to check for a next token or more, for instance, and then get_descendant_tokens() would be more natural with a depth-first traversal. If really needed, we could also provide both. What do you think?

JanJakes · 2024-11-22T15:42:38Z

@adamziel Another thought regarding WP_Parser_Node: We now provide child and descendant methods. Should we also provide parent methods? It can be useful, but I think it has a major flaw. I'll explain:

Sometimes, it could be useful in the driver, if we want to translate some constructs in a way "if I'm in the context of createTable, do this, otherwise do that". This is a simple way to express some specific nuances, but it's not the only way — we could also handle it from the top within the createTable node instead, constructing the statement a bit more top-down (but it can be more verbose at times).

That said, I think there's a problem — adding parent references to WP_Parser_Node would probably cause memory leaks. PHP GC uses reference counting, which means that it only frees the memory occupied by an object when there are no more references to it, and that circular parent-child references would never allow for garbage collection. Weak refs were only introduced in PHP 7.4, so I guess currently, it doesn't make sense to do this. It also complicates the tree a bit and makes subtrees not "independent". There is the alternative of saving some context (node stack) in the driver during the translation, for instance.

adamziel · 2024-11-23T09:34:39Z

@JanJakes what would be a specific code example where a parent reference would help? I get "do this, do that," but what's "this" and "that"?

adamziel · 2024-11-23T09:50:44Z

Also the HTML and the XML API solve the same problem with the get_ breadcrumbs a method that gives you a stack of parent elements names. Maybe the same would use here for AST notes?

JanJakes · 2024-11-23T14:53:08Z

@JanJakes what would be a specific code example where a parent reference would help? I get "do this, do that," but what's "this" and "that"?

@adamziel For now, I've run into the INTEGER PRIMARY KEY "without" vs "with" autoincrement — in the former case, I'd like to translate it as INT PRIMARY KEY while in the latter one as INTEGER PRIMARY KEY due to the "bug" that's preserved in SQLite for back compatibility. In the AST, it would look something like this:

createStatement
   |- createTable
     |- tableElementList
       |- tableElement          <-- Here I can reliably determine PRIMARY KEY
       |  |- columnDefinition
       |    |- fieldDefinition  <-- Here I can reliably determine AUTOINCREMENT
       |     |- dataType        <-- Here I want to know about PRIMARY KEY and AUTOINCREMENT
       |     |- columnAttribute <-- Here PRIMARY KEY and AUTOINCREMENT can be specified
       |     ...
       |- tableElement
         |- tableConstraintDef  <-- But PRIMARY KEY can also be specified here (AUTOINCREMENT not)

Without the tableConstraintDef option for primary keys, it would be simple-enough to handle this case in fieldDefinition in a top-down manner. Something like:

case 'fieldDefinition':
	$has_primary_key   = ...;
	$has_autoincrement = ...;
	if ( $has_primary_key ) {
		$children       = $ast->get_children();
		$data_type_node = array_shift( $children );
		$data_type      = $this->translate( $data_type_node );
		if ( 'INTEGER' === $data_type ) {
			$data_type = $has_autoincrement ? 'INTEGER' : 'INT';
		}
		return $data_type . $this->translate_sequence( $children );
	}
	return $this->translate_sequence( $ast->get_children() );

Here, the $has_primary_key can peek below for descendant token, but it has no way to verify the tableConstraintDef, that would be in a different tableElement. I would either need a reference to the parent tableElement to use it to search for a primary key, or I would need to save it somehow ($this->set_context('primary_key', ...)) and then read it below.

However, in the case of CREATE TABLE, I'm also wondering whether it may make more sense to fully parse the create table statement (into columns, constraints, etc.), and only then construct a query from that, like we do in the current translator. It's granular changes on specific AST nodes vs reading the full MySQL statement and generating a new SQLite statement from that. I don't know yet which of these options will be ultimately more convenient and reliable.

adamziel · 2024-11-23T16:08:27Z

Two ideas I got to avoid a parent reference:

Could the driver or parser class keep an array like $child => $parent and expose a method such as get_parent_of?
The query execution process could keep track of the stack of nodes it entered

adamziel · 2024-11-23T16:09:04Z

It's granular changes on specific AST nodes vs reading the full MySQL statement and generating a new SQLite statement from that. I don't know yet which of these options will be ultimately more convenient and reliable.

Are these actually different? You need to traverse the tree node by node in either case. Or would the latter mean doing one pass to build a flat index of, say, modifiers and options, and then doing the translation in the second pass? Because I don't think that index could be flat - we'd still need a recursive data structure to account for subqueries.

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

JanJakes · 2024-11-25T09:45:48Z

It's granular changes on specific AST nodes vs reading the full MySQL statement and generating a new SQLite statement from that. I don't know yet which of these options will be ultimately more convenient and reliable.

Are these actually different? You need to traverse the tree node by node in either case. Or would the latter mean doing one pass to build a flat index of, say, modifiers and options, and then doing the translation in the second pass? Because I don't think that index could be flat - we'd still need a recursive data structure to account for subqueries.

@adamziel I think they are a bit different:

The first approach with only the necessary granular changes often relies on the default case that just recursively translates all children and concatenates them with a space. At the moment, this often works thanks to the similarity of the two SQL dialects, so even tokens are reused (SELECT, parentheses, etc.). In a way, we're just taking the MySQL query and doing some adjustments, where needed. Over time, we'll probably cover the translation more and more on each node, and will rely a bit less on the shape of the original MySQL query.
The second approach would be oriented more towards reading data and then constructing a whole new query, rather than doing only the necessary changes. The index of the elements could be flat, at least in this case. More generally, even with this approach, the recursion is still possible — if I know that I have a list of some definitions, I can still get them with $definition = $this->translate( ... ), in case they can contain expressions, subqueries, etc.

Looking at CREATE TABLE in particular, and the grammar branch that contains tableElementList (the last below),

createTable:
    TEMPORARY_SYMBOL? TABLE_SYMBOL ifNotExists? tableName (
        LIKE_SYMBOL tableRef
        | OPEN_PAR_SYMBOL LIKE_SYMBOL tableRef CLOSE_PAR_SYMBOL
        | (OPEN_PAR_SYMBOL tableElementList CLOSE_PAR_SYMBOL)? createTableOptions? partitionClause? duplicateAsQueryExpression?
    )
;

we would be able to get a list of all column and constraint definitions, organize them somehow, and then generate the new CREATE TABLE query. The question is whether we should do that, or if we should just go with option 1.

Here's maybe a good example for thinking about this:

A PRIMARY KEY constraint can be defined either within a particular column definition, or in a different table element, separately from the column definition. Now:

With option 1, we'd keep the definition where it was in MySQL, not normalizing it in any way, and relying on the fact that both MySQL and SQLite support defining PRIMARY KEY in both places. When we would run into a specific constraint that can be defined in both places in MySQL, but only in a single one in SQLite, we'd just handle that specific case.
With option 2, we would first create a list of all columns and constraints, and then generate a query from that. Inline constraints would be normalized to separate constraint definitions (or the other way round), etc.

As a consequence, option 1 fixes only the particular cases that SQLite supports differently, leaving the rest mostly to their similarities. If there is a specific keyword or modifier we need to ignore, it needs to be done explicitly.

With option 2, we would be picking and allow-listing the things we support, possibly ignoring any other ones. In many cases, I think option 2 may be a bit more verbose, unless there are many exceptions and edge cases that would have to be handled in option 1.

Ultimately, we can do a little bit of both — sometimes we can be more descriptive for a specific node, we can decompose it and fully reconstruct in an option 2 fashion, while still leveraging option 1 as well by translating some subnodes recursively. The decision may be needed on a case-by-case, but it is a question of what would be a useful guideline to deciding this.

And here's a particular example:

In MySQL, a PRIMARY KEY with AUTOINCREMENT can be defined in both of these ways:
a) CREATE TABLE t (id INT PRIMARY KEY AUTO_INCREMENT);
b) CREATE TABLE t (id INT AUTO_INCREMENT, PRIMARY KEY (id) );

While in SQLite, b) is a syntax error. It means that we have to rewrite it to something like a) because AUTOINCREMENT in SQLite can be only in the column definition and only after a PRIMARY KEY constraint. And here we're touching on the questions — handling this particular case vs reading and normalizing the full statement.

Another interesting example:

In MySQL, we can write both PRIMARY KEY AUTO_INCREMENT and AUTO_INCREMENT PRIMARY KEY.
In SQLite, it always needs to be PRIMARY KEY AUTOINCREMENT, the other one is a syntax error.

Going back to the two options, with option 1, we need to catch and handle this case, while with option 2, it would probably just work due to the full statement reading and normalization, at the likely cost of more verbosity.

Ultimately, it may also work that we only normalize the PRIMARY KEY & AUTOINCREMENT definitions, leaving the rest as-is. At the moment, I don't know yet if there will be more of such cases in this particular statement.

JanJakes · 2024-12-20T15:09:44Z

🎄 Update

I'm finally here with an update, I'm sorry for being overdue. It's a longer read, probably should've been split into multiple.

I'll start with a summary:

61% (71/116) of the original tests are passing. Before starting the INFORMATION_SCHEMA rewrite, it was 33% (38/116).
A significant portion of some of the hardest statements is done — CREATE TABLE and ALTER TABLE operations.
The basic forms of SHOW and DESCRIBE using the INFORMATION_SCHEMA are implemented.
However, I think there's also still quite a bit of work left to get to 100%, especially to get everything correct and stable.
The remaining failing tests are:
- String matching and regexp support (LIKE, RLIKE, etc.) — 8 tests.
- Date/time related columns — 5 tests.
- SQL_CALC_FOUND_ROWS — 4 tests.
- LEFT function — 4 tests.
- SHOW TABLES with LIKE and FROM — 4 tests.
- ON CONFLICT ... — 3 tests.
- SELECT FROM DUAL — 2 tests.
- Other things like CREATE TEMPORARY TABLE, HAVING COUNT variations, CURRENT_TIMESTAMP syntax, ON UDPATE, CAST as binary, etc.
For more details, check out the WP_SQLite_Information_Schema_Builder.

The full list of failures as of today can be found here: https://gist.github.com/JanJakes/23e36446b24c88bee7ca81415d9f6059

Now, I'd like to share some more details about the progress, and the rabbit holes I encountered along the way.

Simple statements -> Basic DDL -> Information schema

A couple of weeks ago, I started to implement the new SQLite driver to translate MySQL statements to SQLite statements, using the new MySQL parser. I implemented the basics of a few simple statements, such as SELECT, INSERT, UPDATE, REPLACE, and DELETE, and then I switched my focus to DDL statements, such as CREATE TABLE and ALTER TABLE.

This turned out to bring some complexities, and after further discussions, and realizing that a pure MySQL -> SQLite DDL conversion is lossy, we decided to implement a subset of MySQL's INFORMATION_SCHEMA tables.

Implementing `INFORMATION_SCHEMA` tables

First, I decided to focus on the schema tables that are necessary to describe tables, columns, indexes, and constraints. I started by implementing the CREATE TABLE statement. This is essentially an AST to information schema query conversion. In its basics, it's a simple logic, but reaching high coverage and correctness turned out to be pretty extensive.

Here are some interesting issues I encountered:

Some columns depend on each other. Sometimes, a charset can determine a collation (we need a full map of default collations for each charsets), sometimes a collation can determine a charset (here, it's just its prefix). Column lengths and numeric precisions depend on data types, charset (we need a map of charset to maximum bytes per character), etc.
Some tables depend on each other. Adding or dropping a constraint may affect the nullability and some other attributes of a column as well, and this data needs to be synchronized. Maybe in the future, we could reduce the number of actual information schema tables, and use views to present the data (MySQL has a similar approach from version 8).
Some operations have non-trivial effects on the schema. Dropping a column will remove it from existing indexes, meaning that multi-column index records may need to be renumbered, and other column attributes may change.
Some information schema values require a lot of normalization. There are so many was to write varchars, national varchars, and text fields (char varying, character varying, varchar, national varchar, nvarchar, nchar varchar, national char varying, nchar varying, long char varying, long character varying, long varchar, ...), and some keywords can change their meaning (long converts varchar to text, appending CHARSET BINARY seems to change varchar to varbinary, etc.).
For some values, we may need to implement an AST -> MySQL string conversion. More on that below.

I've either addressed/TODO-ed all the issues I've encountered, but a more extensive test case for some of them is still needed. More details can be seen in WP_SQLite_Information_Schema_Builder which implements all of the AST -> information schema logic.

MySQL AST -> string conversion

As I was working on the information schema builder, I realized I not only need to read the MySQL AST, but in some cases, it's also necessary to convert some nodes back to strings. A simple example of this is reading a table name from a node that has multiple nesting levels below (identifier -> pureIdentifier -> IDENTIFIER). A more complex example is a column definition with a DEFAULT(expr), in which case I need to save the serialized expression in the INFORMATION_SCHEMA.

I think there are multiple approaches we could consider:

Maybe just walking the tree and joining all tokens with spaces would work.
We could store the original whitespaces in the AST, and then join the values.
We could store offsets in the original SQL input for each node, and then just cut out what we need.
We could implement an AST -> SQL string printer, maybe a bit more involved than option 1.

Additionally, I often need to get a bit more than the original string. For identifiers, the value is not enough — I need to unquote it first (replace quoted sequences by characters, trim quotes), and it is a question where such logic belongs.

ALTER TABLE

The next topic after storing the CREATE TABLE information in the INFORMATION_SCHEMA was implementing support for the most common table altering operations. For that, I refactored my initial draft to reusable methods, and realized that ALTER TABLE statements brings another set of challenges.

SQLite has a limited support for ALTER TABLE statements, so I started by trying to map those statements that SQLite can handle to their SQLite counterparts, while falling back to a table-copying approach for the other ones. This turned out to be a rabbit hole:

ALTER TABLE statements can contain multiple operations, and they often do. Therefore, I needed to try to first decide whether I needed to copy the table, or whether I will be able to just map a single MySQL statement to multiple SQLite ALTER TABLE statements.
Determining which MySQL ALTER TABLE statements are directly supported by SQLite is not straightforward. It may seem that an operation, such as ADD COLUMN is simply supported, but the reality is that it might not be in all cases (e.g., with FIRST/AFTER ...).
Implementing the "direct" plus "fallback to full copy" approach means doing many things twice.

For these reasons and after some experimenting, I decided to first focus on implementing the full table-copy ALTER TABLE algorithm, leaving the direct translation of some operations as an optimization for later. The approach is pretty straightforward:

Reflect all ALTER TABLE operations in the INFORMATION_SCHEMA.
Run the SQLite ALTER TABLE algorithm with full table copy as per the SQLite docs.

This turned out to work flawlessly and to be very helpful in the initial implementation phase. Implementing a new ALTER TABLE operations is as simple as reflecting its changes in the information schema at the moment. This is great for broadening the statement support before focusing on optimizing some of the statements.

Finally, the ALTER TABLE algorithm recommended in the SQLite docs requires only a one table copy, so in the new driver, we don't do two copies like we did in the old one.

Connecting the pieces

What turned out to be challenging in this first phase of implementing the INFORMATION_SCHEMA support was the fact that until I got to some level of completeness, I wasn't able to see any progress in terms of passing tests, and it wasn't easy to estimate where exactly we were regarding the compatibility with the existing SQLite translator.

This is because:

The actual database operations can't be implemented before the data is stored in INFORMATION_SCHEMA.
Saving only some data in the INFORMATION_SCHEMA means the tests will fail because the data is incomplete.
Implementing the INFORMATION_SCHEMA builder and the CREATE/ALTER TABLE operations is not enough either. Most of the tests verify the results using SHOW and DESCRIBE statements.

Therefore, while working on the INFORMATION_SCHEMA, I haven't seen any progress on the original SQL translation test suite (but I did write a new set of tests along the way). But the reward comes at the end — quickly prototyping SHOW and DESCRIBE statements with the INFORMATION_SCHEMA at hand turned out to be easy and straightforward, and suddenly, many tests started to pass, while for some others, this revealed incorrect assertions, as the old driver doesn't always have 100% correct data at hand.

Next steps

There is still a good amount of work to get to 100% of the test suite for the current SQLite translator, but I think in the current state, there is a good foundation for many of the hard parts, and building on that will hopefully be easier. Above that, there are some MySQL features that can likely be ported from the old translator with not so much of a rewrite.

Additionally, the existing test suite may not fully cover everything that the current translator implements, so I think it would be good to look at multiple measures:

The percentage of passing tests in the existing test suite.
Checking the existing translator method by method and ensuring we implement all the operations.
Running WordPress with the new driver.
Examining the new driver using Playground Tester.

That said, maybe the path to get this merged, e.g., with a feature flag, is to focus on point 1, and do the rest in other PRs.

adamziel · 2024-12-20T17:50:06Z

Monumental work here @JanJakes, and a tremendous update. Mad props.

That said, maybe the path to get this merged, e.g., with a feature flag, is to focus on point 1, and do the rest in other PRs.

I would love that! It feels like a good spot to park, do a round or a few rounds of review, and merge. Let's just rewire the CI tests to the old driver to avoid failing checks on trunk.

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

adamziel · 2024-12-20T18:00:58Z

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

+		$this->pdo->query( 'PRAGMA encoding="UTF-8";' );
+
+		$valid_journal_modes = array( 'DELETE', 'TRUNCATE', 'PERSIST', 'MEMORY', 'WAL', 'OFF' );
+		if ( defined( 'SQLITE_JOURNAL_MODE' ) && in_array( SQLITE_JOURNAL_MODE, $valid_journal_modes, true ) ) {


How easy/hard would if be to avoid referencing any global constants and variables in this driver? Ideally we could use it outside of WordPress as a drop-in PDO replacement and it's not easy to see all the configuration options when we rely on those constants. Can we source this information from a constructor argument? If we need a bit of dedicated code to connect it with WordPress, we could have a separate function that glues all the WordPress-specific bits with new WP_SQLite_Driver()

Just marking that this is copy-pasted from the legacy translator. It will be definitely great to refactor this to an explicit set of config options 👍

wp-includes/sqlite-ast/class-wp-sqlite-driver.php

JanJakes mentioned this pull request Nov 14, 2024

Advanced MySQL compatibility #162

Open

100 tasks

JanJakes force-pushed the ast-sqlite-driver branch from fac342d to b8d04ea Compare November 15, 2024 15:29

JanJakes added 6 commits November 19, 2024 16:05

Split, cleanup, and reorganize the SQLite driver prototype

aac00ce

Namespace-prefix the WIP SQLite driver for now to avoid naming conflicts

44641e1

Copy WP_SQLite_Translator_Tests and run them against the SQLite driver

be35587

Copy the core of WP_SQLite_Translator to WP_SQLite_Driver

c8ea855

Additionally, move the current SQLite driver prototype to a temporary class WP_SQLite_Driver_Prototype to be gradually moved to the new driver.

Add a base generic WP_Parser_Token class, add docs

62943d6

Complete WP_Parser_Node helper methods, add tests

3449b0b

JanJakes force-pushed the ast-sqlite-driver branch from b8d04ea to 63460dd Compare November 19, 2024 15:32

Base automatically changed from recursive-descent-parser to develop November 20, 2024 07:39

JanJakes added 2 commits November 20, 2024 16:15

Add basic support for SELECT statements, add unit tests

65afd67

Add basic support for INSERT, UPDATE, REPLACE, DELETE

00ec46d

JanJakes force-pushed the ast-sqlite-driver branch from 63460dd to a02f751 Compare November 20, 2024 16:02

adamziel reviewed Nov 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

Add basic support for CREATE TABLE, implement data types

b8b4500

JanJakes force-pushed the ast-sqlite-driver branch from a02f751 to b8b4500 Compare November 21, 2024 12:15

JanJakes added 2 commits November 21, 2024 17:57

Handle system variables

6be220d

Add support for UPDATE with ORDER BY and LIMIT

70744b7

JanJakes commented Nov 22, 2024

View reviewed changes

adamziel reviewed Nov 23, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

JanJakes added 6 commits December 20, 2024 12:38

Execute ALTER TABLE using information schema

4b4fb8f

Implement SHOW CREATE TABLE using information schema

a0a67c0

Implement SHOW INDEX using information schema

837655c

Implement SHOW GRANTS

f11fb09

Implement DESCRIBE using information schema

cf6008d

Ignore index hints

f5a52ca

JanJakes force-pushed the ast-sqlite-driver branch from 04d7c17 to 665d8bc Compare December 20, 2024 11:39

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

adamziel reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

brandonpayton reviewed Dec 20, 2024

View reviewed changes

wp-includes/sqlite-ast/class-wp-sqlite-driver.php Show resolved Hide resolved

JanJakes added 2 commits January 10, 2025 10:10

Add support for LIKE BINARY

174916a

Add support for REGEXP functions

335388b

JanJakes force-pushed the ast-sqlite-driver branch from a8a1a93 to 335388b Compare January 10, 2025 09:11

adamziel and others added 3 commits January 10, 2025 11:29

Merge branch 'develop' into ast-sqlite-driver

191864c

Add support for LEFT function

f29464c

Add support for DATE_ADD and DATE_SUB functions

8cfe7f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AST-based SQLite driver #164

AST-based SQLite driver #164

JanJakes commented Nov 14, 2024

JanJakes commented Nov 15, 2024

adamziel commented Nov 15, 2024 •

edited

Loading

JanJakes commented Nov 18, 2024

JanJakes Nov 22, 2024

JanJakes commented Nov 22, 2024

adamziel commented Nov 23, 2024 •

edited

Loading

adamziel commented Nov 23, 2024

JanJakes commented Nov 23, 2024

adamziel commented Nov 23, 2024

adamziel commented Nov 23, 2024 •

edited

Loading

JanJakes commented Nov 25, 2024

JanJakes commented Dec 20, 2024

adamziel commented Dec 20, 2024

adamziel Dec 20, 2024

JanJakes Dec 20, 2024

AST-based SQLite driver #164

Are you sure you want to change the base?

AST-based SQLite driver #164

Conversation

JanJakes commented Nov 14, 2024

JanJakes commented Nov 15, 2024

adamziel commented Nov 15, 2024 • edited Loading

JanJakes commented Nov 18, 2024

JanJakes Nov 22, 2024

Choose a reason for hiding this comment

JanJakes commented Nov 22, 2024

adamziel commented Nov 23, 2024 • edited Loading

adamziel commented Nov 23, 2024

JanJakes commented Nov 23, 2024

adamziel commented Nov 23, 2024

adamziel commented Nov 23, 2024 • edited Loading

JanJakes commented Nov 25, 2024

JanJakes commented Dec 20, 2024

🎄 Update

Simple statements -> Basic DDL -> Information schema

Implementing INFORMATION_SCHEMA tables

MySQL AST -> string conversion

ALTER TABLE

Connecting the pieces

Next steps

adamziel commented Dec 20, 2024

adamziel Dec 20, 2024

Choose a reason for hiding this comment

JanJakes Dec 20, 2024

Choose a reason for hiding this comment

adamziel commented Nov 15, 2024 •

edited

Loading

adamziel commented Nov 23, 2024 •

edited

Loading

adamziel commented Nov 23, 2024 •

edited

Loading

Implementing `INFORMATION_SCHEMA` tables