Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Implicit commit detection #1529

Open
steverukuts opened this issue Dec 5, 2024 · 4 comments
Open

Proposal: Implicit commit detection #1529

steverukuts opened this issue Dec 5, 2024 · 4 comments

Comments

@steverukuts
Copy link

steverukuts commented Dec 5, 2024

Overview

  1. MySQL silently commits your transaction when you manipulate a temporary table after starting a transaction.
  2. The MySQL protocol provides a way to detect this mistake, but I don't think this driver uses it.
  3. The proposed solution however is quite radical and deserves considerable debate.
  4. Perhaps opening up the query pipeline for extension could be an alternative move.

Problem statement

Let's take a look at an example of how you can corrupt your data by accident:

-- We set up an example database and table:
CREATE DATABASE `example`;
USE `example`;
CREATE TABLE `DemoData` (Value VARCHAR(20));

-- And now we start a transaction
BEGIN TRANSACTION;

-- Let's create some temporary storage:
CREATE TEMPORARY TABLE temp (value VARCHAR(20));

-- And now let's add an index on our temporary table:
CREATE INDEX temp_value ON temp (value);

-- Now let's add a sample row to DemoData (the real table, defined before we started the transaction)
INSERT INTO `DemoData` VALUES ('example value');

-- Change of plan, let's undo all that work. DemoData should be empty again, right?
ROLLBACK;

-- Surprise! You actually already committed your changes, and your rollback didn't work:
SELECT * FROM DemoData; -- returns one row!

The correct thing for the developer to do here is to set up the table outside of the transaction scope, but if they miss that, they will silently commit the transaction, and every subsequent command is run outside of the transaction.

To the developer moving from SQL Server to MySQL, it looks like you are within a transaction, and you may not discover the truth for many years. This is just an example from a real bug I have seen, but there are other ways to implicitly commit a transaction in this way.

Solution overview

Fortunately, the MySQL protocol provides a way for us to find this problem. The OkPayload can be used to detect this problem after executing a query. You can look at the ServerStatus bitfield, which contains an InTransaction flag.

The InTransaction flag will be in the OkPacket of every query in the transaction until the next query that is not in the transaction. To refer back to my earlier example:

CREATE TEMPORARY TABLE temp (value VARCHAR(20));   -- InTransaction
CREATE INDEX temp_value ON temp (value);           -- InTransaction (this statement causes an implicit commit)
INSERT INTO `DemoData` VALUES ('example value');   -- Not InTransaction

So, when the developer starts a transaction, the driver should check the OkPayload's ServerStatus flags for InTransaction, and if it finds that the transaction is no longer active despite not disposing of the transaction yet, an exception can be raised to notify the developer that the last command actually committed the transaction.

Example API usage:

connection.Execute("CREATE TABLE `DemoData` (Value VARCHAR(20));");  // not InTransaction, but we don't care as we aren't in a transaction scope

using (var txn = connection.BeginTransaction())
{
    txn.Execute("CREATE TEMPORARY TABLE temp (value VARCHAR(20));"); // InTransaction
    txn.Execute("CREATE INDEX temp_value ON temp (value);");         // InTransaction
    txn.Execute("INSERT INTO `DemoData` VALUES ('example value');"); // throws an exception - "the last statement caused an implicit commit!"
}

This behaviour should not be enabled by default. Even though it represents a bug in the developer's code, the unfortunate thing about receiving the transaction state in the OK packet means that the last INSERT statement was actually committed to the database, so enabling this behaviour as a default will definitely cause more problems than it solves.

For discussion

  1. Does the project team agree that this idea has merit?
  2. I am happy to implement this if desired but wanted to know that the work would have a chance of being merged.
  3. Is there an alternative way to implement this, perhaps via some kind of middleware or add-on? I did go looking for an alternative way to implement this without changing the driver itself but couldn't figure that out.
  4. Is there some server or client configuration that mitigates this mistake that I have somehow overlooked?
@bgrainger
Copy link
Member

Some off-the-cuff thoughts:

MySqlConnector already has "strict" transaction requirements, and IgnoreCommandTransaction could be used here. (Arguably, this is a bug in that implementation.) If the driver detects that the transaction was committed (due to the OK payload flag) the next command that specifies a (already-committed) MySqlCommand.Transaction should throw an exception. Users can opt out with the existing Ignore Command Transaction option.

A less invasive way might be to report the transaction status on a new property, similar to MySqlCommand.LastInsertedId. It would be up to clients who care to check this flag and throw an exception if it was false when it's expected to be true.

How would any of these approaches handle code like this:

command.Transaction = myTrans;
command.CommandText = """
    INSERT INTO `DemoData` VALUES ('first value');
    CREATE TEMPORARY TABLE temp (value VARCHAR(20));
    CREATE INDEX temp_value ON temp (value);
    INSERT INTO `DemoData` VALUES ('second value');
    """;
command.ExecuteNonQuery(); // transaction is committed halfway through

@bgrainger
Copy link
Member

From dotnet/runtime#95399:

SQL Server and PostgreSQL implementation of DbTransaction have a Boolean flag that indicates if the transaction is completed or not. This flag is named IsZombied in SqlTransaction and IsCompleted in NpgsqlTransaction.

That could be another way to represent this data (and there may be benefits to being consistent across ADO.NET providers).

@steverukuts
Copy link
Author

How would any of these approaches handle code like this:

In this case, you would indeed not realise there was a problem until the next command is executed, and you discover you are no longer in a transaction.

I think after reflecting on this a bit more, I'm questioning whether this is something that the driver should do, given you only find the transaction state after it's too late to report an error. Perhaps what instead I am looking for is the ability to build something like this into my application:

var command = new MySqlConnection();
command.QueryExecuting += e => 
{
    if (e.InTransaction && e.CommandText.Contains("DROP TABLE"))
        throw new InvalidOperationException("Do not manipulate tables when in a transaction");
}

I actually have a class that can parse queries and identify their keywords as part of my application, so I could just use this to ensure that developers don't manipulate a table inside a transaction, i.e. ALTER, CREATE INDEX and so on.

Aside from the logging extensions, I couldn't find a way to inject events or middleware into the driver without totally wrapping it in facade classes. I don't like using logging as an extension point here because it is for the entire app domain, and I might want to disable this for some connections but not others.

Furthermore, I could use QueryExecuting and QueryExecuted hooks or things like them to report APM events or to log queries if they take longer than an expected threshold, and probably other emergent use-cases.

We might even find a place in the library to add this as an optional middleware to opt in to this best practice. Another option could be something like ASP.NET's middleware:

var command = new MySqlConnection();
command.AddMiddleware<PreventManipulationOfTablesInTransaction>()

What are your thoughts on this? I'm happy to develop this myself if there would be appetite.

@bgrainger
Copy link
Member

Aside from the logging extensions, I couldn't find a way to inject events or middleware into the driver without totally wrapping it in facade classes. I don't like using logging as an extension point here because it is for the entire app domain, and I might want to disable this for some connections but not others.

You could use an ActivityListener to listen to MySqlConnector events. E.g., see this test:

.

Furthermore, I could use QueryExecuting and QueryExecuted hooks or things like them to report APM events or to log queries if they take longer than an expected threshold, and probably other emergent use-cases.

The built-in Activity support already supports the APM use case: #1036.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants