-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: MDEV-34705: Storing binlog in InnoDB #3775
Draft
knielsen
wants to merge
41
commits into
11.4
Choose a base branch
from
knielsen_binlog_in_engine
base: 11.4
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…rite an InnoDB tablespace Signed-off-by: Kristian Nielsen <[email protected]>
… InnoDB tablespace The option --innodb-in-engine now causes InnoDB DML commits to include binlogging in the same mtr. Binlog group commit now skips binlogging to old file-based binlog and passes events to InnoDB instead. Many things unfinished still, like allocating new tablespaces when the first one is filled, writing large event groups out-of-band to not bloat the InnoDB commit record in the redo log and exceed max mtr size, writing DDL and all other events to the InnoDB binlog, skipping the creation of the old-style binlog, reading the new style binlog from InnoDB, etc. etc. Signed-off-by: Kristian Nielsen <[email protected]>
Only works for two tablespace files though. For the third, we need to implement closing the first one, so that the tablespace id can be reused. Signed-off-by: Kristian Nielsen <[email protected]>
Before creating the next binlog tablespace N+2, flush out and close the old binlog tablespace N, so that the new tablespace can re-use the tablespace id without conflict. Signed-off-by: Kristian Nielsen <[email protected]>
…plete) Signed-off-by: Kristian Nielsen <[email protected]>
…e to get a hton available Signed-off-by: Kristian Nielsen <[email protected]>
Initial code to read in the binlog dump thread events from InnoDB binlog. Signed-off-by: Kristian Nielsen <[email protected]>
Skip prepare step in InnoDB when it handles the binlog, but re-enable InnoDB fsync at commit. Signed-off-by: Kristian Nielsen <[email protected]>
… engine Signed-off-by: Kristian Nielsen <[email protected]>
…ground thread Signed-off-by: Kristian Nielsen <[email protected]>
…rx cache in-memory buffer Signed-off-by: Kristian Nielsen <[email protected]>
When (re-)starting the server, check for any existing binlog files. Open the last two found (if any), and find the position that was last written before the restart. Continue binlogging from that point rather than creating new binlog files. Signed-off-by: Kristian Nielsen <[email protected]>
Move rpl_gtid and rpl_binlog_state_base into separate rpl_gtid_base.h include that can be used from engines implementing the binlog interface. Signed-off-by: Kristian Nielsen <[email protected]>
Signed-off-by: Kristian Nielsen <[email protected]>
Signed-off-by: Kristian Nielsen <[email protected]>
…iterate() Signed-off-by: Kristian Nielsen <[email protected]>
Every N bytes (hardcoded at 64k for now, to become a configurable setting), write the binlog GTID state into the binlog tablespace. This allows to quickly find a given GTID position by binary search to the prior GTID state in the tablespace and then a small linear scan from that point. The full binlog state is dumped at the start of the binlog file; remaining states dumped are differential states containing only the changed (domain_id, server_id) pairs, to save space if binlog space is large. This commit only implements the writing of the binlog state to the tablespace at regular intervals. The binary search to be implemented in a subsequent commit. Signed-off-by: Kristian Nielsen <[email protected]>
Re-write the logic for selecting between reading from buffer pool or file, and how to move to the next file, in a clean way. Handles a bunch of ToDo's, probably fixes a few bugs, and generally makes the code much more robust. Signed-off-by: Kristian Nielsen <[email protected]>
To restore the binlog state, after finding the position in the old binlog to continue from, read the full gtid state saved at the start of the binlog file as well as the most recent differentioal gtid state written shortly before the starting position. Then construct a binlog reader to read the remaining few events (if any), and update with any GTIDs read to obtain the final restored GTID binlog state. Signed-off-by: Kristian Nielsen <[email protected]>
Signed-off-by: Kristian Nielsen <[email protected]>
To find the target position, we first loop backwards over binlog files, reading the initial GTID state written at the start to find the file to start in. We then binary search on the differential GTID states written every --innodb-binlog-state-interval bytes. This patch does only minimal changes to the dump thread code in sql_repl.cc to be able to send out binlog data to the client. Some re-factoring/cleanup should be done in a follow-up patch to more cleanly separate the two code paths, avoid a lot of if-statements and make the binlog-in-engine code path free of much of the cruft from the legacy binlog implementation. Signed-off-by: Kristian Nielsen <[email protected]>
Signed-off-by: Kristian Nielsen <[email protected]>
Only GTID slave connection is supported, at least for now. Signed-off-by: Kristian Nielsen <[email protected]>
We need two flags on the chunk type to fully identify how a record is split into chunks. A "CONT" flag which marks a continuation chunk (set on all but the first chunk). And a "LAST" flag which marks the end of a record (set only on the last chunk). Signed-off-by: Kristian Nielsen <[email protected]>
Introduce a class/interface chunk_data_base to encapsulate supplying the data to be written as a binlog record. For fsp_binlog_write_cache(), this is an IO_CACHE with two sections (main and gtid) that need to be binlogged in the opposite order. Separate the logic for page writing into a generic fsp_binlog_write_chunk() function which takes a chunk_data_base * as data source. This in preparation for introducing other kinds of data to be written into the binlog, eg. out-of-band partial event group data. Signed-off-by: Kristian Nielsen <[email protected]>
…eparate refactoring of end_event Signed-off-by: Kristian Nielsen <[email protected]>
…-band Signed-off-by: Kristian Nielsen <[email protected]>
…rd (untested) Signed-off-by: Kristian Nielsen <[email protected]>
…g_reader Signed-off-by: Kristian Nielsen <[email protected]>
With this commit, the out-of-band binlogging of large event groups in multiple smaller records interleaved with other event groups is now working. Instead of flushing the binlog cache to disk when they reach @@binlog_cache_size, instead the cache is binlogged as an out-of-band record. Then at transaction commit, a commit record is written containing just the GTID and a link to the out-of-band data. To facilitate append-only operation, the binlogged records do not have a "next" pointer. Instead, they are written out as a forest of perfect binary trees, the leftmost leaf of one tree pointing to the root of the previous tree. This structure is used in the binlog reader to efficiently read out the event group data consecutively for the binlog dump thread, needing to maintain only O(log(N)) amount of memory during the reading. As part of this commit, the existing binlog reader code is refactored to be greatly improved, with a much cleaner explicit state machine and handling of chunk/page/file boundaries etc. Also fixes some bugs in the gtid_search::find_gtid_pos(). Signed-off-by: Kristian Nielsen <[email protected]>
Move high-level binlog code to handler/handler0binlog.cc and low-level code to fsp/fsp0binlog.cc. Signed-off-by: Kristian Nielsen <[email protected]>
When starting to read data from a specific GTID position, the logic for skipping any initial, partial record that we start in the middle of, was incorrect. Signed-off-by: Kristian Nielsen <[email protected]>
…ew files Signed-off-by: Kristian Nielsen <[email protected]>
Add option --binlog-directory, used to place the binlogs outside the data directory (eg. to put them on different disk/file system). Disallow specifying the binlog name in --log-bin when --binlog-storage-engine is used, as the name is then not user configurable. A ToDo (not implemented in this commit) is to use the --binlog-directory value, if given, also for the legacy binlog implementation. Signed-off-by: Kristian Nielsen <[email protected]>
Signed-off-by: Kristian Nielsen <[email protected]>
No DELETE_DOMAIN_ID supported yet, will come in a later commit, after PURGE is implemented. Signed-off-by: Kristian Nielsen <[email protected]>
Signed-off-by: Kristian Nielsen <[email protected]>
…uite mostly pass Signed-off-by: Kristian Nielsen <[email protected]>
|
Enable binlog_in_engine as a default suite. Fix embedded and Windows build failures. Use sql_print_(error|warning) over ib::error() and ib::warn(). Use small_vector<> for the innodb_binlog_oob_reader instead of a custom implementation. Signed-off-by: Kristian Nielsen <[email protected]>
knielsen
force-pushed
the
knielsen_binlog_in_engine
branch
from
January 17, 2025 17:10
2ef2e33
to
aa64f73
Compare
Signed-off-by: Kristian Nielsen <[email protected]>
knielsen
force-pushed
the
knielsen_binlog_in_engine
branch
from
January 17, 2025 20:07
aa64f73
to
18932be
Compare
Fix missing WORDS_BIGENDIAN define in ut0compr_int.cc. Fix misaligned read buffer for O_DIRECT. Fix wrong/missing update_binlog_end_pos() in binlog group commit. Fix race where active_binlog_file_no incremented too early. Fix wrong assertion when reader reaches the very start of (active+1). Signed-off-by: Kristian Nielsen <[email protected]>
cvicentiu
added
the
MariaDB Foundation
Pull requests created by MariaDB Foundation
label
Jan 22, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Draft pull request for work-in-progress on the MDEV-34075 binlog-in-engine feature
A new option --binlog-storage-engine=ENGINE moves the binlog implementation into the storage engine, for supporting engines (currently only InnoDB).
InnoDB implements the binlog files as a new type of tablespace, and uses its redo log to make the binlog crash-safe without the overhead and complexity of two-phase commit.