Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive Rails database migrations #243

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

Holmes98
Copy link
Member

@Holmes98 Holmes98 commented Jan 5, 2024

This modifies schema.rb to match the production database, and merges all existing migrations into a single migration.
The new migration contains the contents of schema.rb, and also this execute statement which isn't supported by the current schema.rb.

See c979d38 for a comparison of SQL schema dumps (the "migrate" dumps were taken after db:migrate:reset, while the "schema" dumps were taken after db:schema:load). Note that there are still a few differences between the production DB and this branch; those could be addressed separately.

@Holmes98 Holmes98 force-pushed the archive-migrations branch from ef505aa to cbf5331 Compare January 5, 2024 11:52
@coveralls
Copy link

coveralls commented Jan 5, 2024

Coverage Status

coverage: 37.131%. remained the same
when pulling 95957e4 on archive-migrations
into 57b299d on master.

Copy link
Contributor

@tom93 tom93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will be merged after #225 and the related changes?

We should link to some page explaining the idea, I think stackoverflow is the source; maybe also collectiveidea.com or naturaily.com?

Nit: I think it would be nicer to use the oldest existing migration ID (20110819224757) as the ID for load_schema, rather than the newest ID. (When we archive more migrations we'll keep the migration ID, so the only consistent approach is to pick the oldest.)

(Dare we go even further and assign a brand new ID? E.g. 00000000000000_load_schema.rb. Advantage is that it stands out and makes it easier to archive migrations -- mv db/migrate/202* db/archive. Disadvantage is that we have to use some trick to mark the migration as "up" on prod without actually running it. If we do this, we should definitely add a check that aborts/skips the migration if the tables already exist -- should be enough to just check for one well-known table.)

I think we'll also need to go over a DB dump and look for any data created by migrations (and move it to db/seeds.rb). See Discord for a relevant comment.

I saw you had def change earlier but replaced it with def up, I assume the only issue is the CREATE INDEX. Do we want to make it reversible?

I'm happy to commit & push my suggestions.

Comment on lines +3 to +5
# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"

Copy link
Contributor

@tom93 tom93 Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we delete these lines? (Just saying this because migrations don't usually call enable_extension)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure but I'd prefer to leave it unless it's causing issues. Note that the squasher gem also includes it.

Comment on lines +6 to +7
create_table "contest_relations", force: :cascade do |t|
t.integer "user_id"
Copy link
Contributor

@tom93 tom93 Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It would be nice if these lines were naturally indented the same as in schema.rb (so they can be copy pasted directly without re-indenting). How about this hack:

class LoadSchema < ActiveRecord::Migration
  def up
    # Load the schema from a snapshot of db/schema.rb (see below)
    load_schema

    # Statements that aren't supported by db/schema.rb
    execute "CREATE UNIQUE INDEX index_users_on_username ON users (lower(username))"
  end

  def down
    ...
  end
end

# Snapshot of db/schema.rb.
# We define this instance method outside of the class, so that its body will
# have the same indentation as the original code from db/schema.rb.
LoadSchema.send(:define_method, :load_schema) do
  create_table "contest_relations", force: :cascade do |t|
    ...
  end
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a big deal; indenting is trivial, and you can use diff -b to ignore whitespace changes (to verify that it matches). On the other hand, I do like that execute statement is visually separated from the rest of the schema. Feel free to change it.

Copy link
Contributor

@tom93 tom93 Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we please pick a different directory structure? I know "archive" is suggested on stackoverflow, but I'd prefer e.g. db/archived_migrations (from another answer on same page).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Member Author

@Holmes98 Holmes98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think it would be nicer to use the oldest existing migration ID (20110819224757) as the ID for load_schema, rather than the newest ID. (When we archive more migrations we'll keep the migration ID, so the only consistent approach is to pick the oldest.)

I think the newest ID is preferable, because:

  1. It indicates that the migration "contains" all the other archived migrations up to that particular ID.
  2. Most examples I've seen seem to use the newest ID.
  3. If anyone has an existing db setup that isn't fully migrated up to this version, running db:migrate will appear to succeed, but it won't perform this migration, which could cause confusion. This would have the same issue in the future if reuse we use a separate ID. If we use the newest ID, the migration will run for anyone that isn't up to date; note that this will cause existing tables to be dropped – I think this is acceptable, otherwise we could maybe add something like this (maybe also safer for prod):
if ActiveRecord::Migrator.current_version > 0
  raise StandardError, "The database schema is out of date; running this migration will cause irreversible data loss!"
end

Note that it's trivial to rename whenever we want to archive more migrations, and I think git should preserve file history as long as the schema doesn't change too much.

I think we'll also need to go over a DB dump and look for any data created by migrations (and move it to db/seeds.rb). See Discord for a relevant comment.

Good point, I've checked and the only case was the system mailer settings. I'll create a separate PR for that; would like to squash this one.

I saw you had def change earlier but replaced it with def up, I assume the only issue is the CREATE INDEX. Do we want to make it reversible?

I don't think so; rollbacking this migration would cause all tables to be dropped, which probably isn't desirable (at least on prod). It already raised an error when attempting to rollback under def change, but I changed to be explicit, just to be safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Comment on lines +6 to +7
create_table "contest_relations", force: :cascade do |t|
t.integer "user_id"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a big deal; indenting is trivial, and you can use diff -b to ignore whitespace changes (to verify that it matches). On the other hand, I do like that execute statement is visually separated from the rest of the schema. Feel free to change it.

Comment on lines +3 to +5
# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure but I'd prefer to leave it unless it's causing issues. Note that the squasher gem also includes it.

@bagedevimo
Copy link
Contributor

Two questions:

  • Not that i'm against this, but what's the goal - be good to have a "why" in a commit message
  • Why keep the archived migrations? They're in Git? Not sure we'll ever need the files?

@Holmes98
Copy link
Member Author

Holmes98 commented Jan 6, 2024

The main goal is to resync schema.rb with the production database; it is currently desynced due to differences in limits that were discussed in #225. Archiving the migrations provides other benefits, but also makes this easier because we can (mostly) just insert the production schema into a new migration.

Why keep the archived migrations? They're in Git? Not sure we'll ever need the files?

I think it's more convenient to have all the archived migrations in master, so they can be searched through/referenced easily. If we remove them and then later add more migrations, and then squash/remove them again, we'll end up in a state where you can't easily view the full migration history without checking out multiple commits.

Holmes98 added a commit that referenced this pull request Jan 6, 2024
These are currently created during migrations, but we plan to
remove this behaviour in #243. The values are left blank, as
they need to be set manually anyway.
@bagedevimo
Copy link
Contributor

The main goal is to resync schema.rb with the production database; it is currently desynced due to differences in limits that were discussed in #225. Archiving the migrations provides other benefits, but also makes this easier because we can (mostly) just insert the production schema into a new migration.

I think I'd prefer to fix the outdated migrations than have a mega-migration - also though missing migrations don't cause an error as such, it is a a bit weird. And fixing the older migrations to have limits is not a big job.

I think it's more convenient to have all the archived migrations in master, so they can be searched through/referenced easily. If we remove them and then later add more migrations, and then squash/remove them again, we'll end up in a state where you can't easily view the full migration history without checking out multiple commits.

I think if it's more convenient to keep the migrations around, then why bother having the archive / rollup.

I'm not very committed to this particular topic, but I don't really see the point in both having a rollup migration AND keeping the legacy migrations around. Very happy to be outvoted though :)

@Holmes98
Copy link
Member Author

Holmes98 commented Jan 6, 2024

I think I'd prefer to fix the outdated migrations than have a mega-migration - also though missing migrations don't cause an error as such, it is a a bit weird. And fixing the older migrations to have limits is not a big job.

It's documented in the rails guide, so I don't really have an issue with it. Fixing the older migrations isn't necessarily difficult but I think it's inconvenient to have to do it every time the defaults change.

I think if it's more convenient to keep the migrations around, then why bother having the archive / rollup.

I'm not really fussed about this; would be okay with deleting them, but the way I see it, it's somewhat similar to commenting code. Sure, you could just go through the git history, but it's more convenient to have it in an easily accessible location. Would also be fine with just keeping them in a separate branch.

@bagedevimo
Copy link
Contributor

It's documented in the rails guide, so I don't really have an issue with it. Fixing the older migrations isn't necessarily difficult but I think it's inconvenient to have to do it every time the defaults change.

TIL it's documented behavior.. FWIW, they did fix this in Rails 5 - migrations have a version from Rails 5 onwards, and they apply the defaults from their version. https://www.bigbinary.com/blog/migrations-are-versioned-in-rails-5

I'm not really fussed about this; would be okay with deleting them, but the way I see it, it's somewhat similar to commenting code. Sure, you could just go through the git history, but it's more convenient to have it in an easily accessible location. Would also be fine with just keeping them in a separate branch.

Haha, interesting, I'm also in camp "just delete the code, don't comment it unless you know you want existing code with work with the commented code and it's coming back real soon". I'm not really that fussed either. Branch is fine with me. Keeping them is fine-enough, I don't care very deeply about this so happy for Tom and you to make a call. Just wanted to drop some thoughts.

@Holmes98
Copy link
Member Author

Holmes98 commented Jan 6, 2024

FWIW, they did fix this in Rails 5 - migrations have a version from Rails 5 onwards, and they apply the defaults from their version.

Good point, I wasn't aware of that. In that case, I don't mind if we just fix the older migrations.

@tom93
Copy link
Contributor

tom93 commented Jan 7, 2024

The main goal is to resync schema.rb with the production database

Oh, I misunderstood. I don't really like using the archive process to change the schema, it feels too sneaky. I'd prefer to explicitly add limit: 255 to the old migrations (the implementation work is done -- branch schema-limit-255).

So if we want the final database to have limit: 255, my preferred process would be:

  1. Explicitly add limit: 255 to the old migrations
  2. (Optional) Archive the old migrations

If we want the final database to not have limit: 255 (which is my preference), the process would be:

  1. Make sure we are still enforcing sensible limits in the model validators (we don't want 1,000-character usernames)
  2. (Optional) Explicitly add limit: 255 to the old migrations
  3. Add a new migration that changes the column types to remove the limit
  4. (Optional) If we want, we can later revert commits 1 and/or 2 (they are not required after prod is migrated)
  5. (Optional) Archive the old migrations

For the finer points, here are my preferences and reasoning:

  • [weak] Remove limit: 255, for the same reasons Rails removed the limits and for consistency with new columns
  • [weak] Archive old migrations (using the load_schema trick), because maintaining old migrations can be a pain. Rail's versioned migrations should make this easier, but we also had other issues (e.g. because migrations used models) and the solutions are awkward.
  • Keep the archived migrations in a directory (don't delete them), because they may be useful (e.g. as examples to draw upon) and having to look through the git history makes it harder.

P.S. Re missing migrations and "NO FILE", we can get rid of those lines using a fairly simple hack (manually remove those versions from the schema_migrations table). But I think it's more "truthful" to keep those lines.

@Holmes98
Copy link
Member Author

Holmes98 commented Jan 7, 2024

Oh, I misunderstood. I don't really like using the archive process to change the schema, it feels too sneaky.

I'm okay with it because the schema/migrations already "changed" inadvertently when we updated from Rails 4.1 to 4.2, so this is really just changing it back to what it was before, to match production.

Anyway, I basically agree with the rest of your points, except that I have no particular preference on whether we should remove the limits or not. It would be nice to be follow the defaults and be consistent, but if migrations are versioned from Rails 5+ then that's basically the opposite (unless we do a similar thing every time the defaults change, which seems like a pain).

In either case, I think we should explicitly add limit: 255 to the current schema first, since that's what's currently on production.

@tom93
Copy link
Contributor

tom93 commented Jan 7, 2024

If we do use the archived migration to make the change, then I'd like it to be done as two separate and well-documented commits (happy to do that myself): 1. Modify db/schema.rb to match prod, then 2. Archive db/schema.rb.

Re changing defaults and the future, I feel versioned migrations give us the option to either keep the old behaviour or manually switch to the new default (we can decide on a case-by-case basis depending on effort and so on).

Since it looks like nobody has a strong preference re limit: 255, shall we just do a vote?
👍 - restore limit: 255 (Rails <= 4.1 behaviour)
👎 - remove limit: 255 (Rails >= 4.2 behaviour)
(Or I could first put together a draft PR to see what a migration to remove limit: 255 from prod might look like)

@Holmes98
Copy link
Member Author

Holmes98 commented Jan 7, 2024

If we do use the archived migration to make the change, then I'd like it to be done as two separate and well-documented commits (happy to do that myself): 1. Modify db/schema.rb to match prod, then 2. Archive db/schema.rb.

Sure. To clarify, I'm also fine with using your schema-limit-255 branch if you prefer that.

Feel free to just create a separate PR to remove the limits afterwards; I don't have any objections.

(Optional) If we want, we can later revert commits 1 and/or 2 (they are not required after prod is migrated)

Also, I don't think there's much point in doing this; if we want to clean up the migrations we may as well just archive everything.

@tom93
Copy link
Contributor

tom93 commented Jan 7, 2024

So can I propose the following:

  1. Explicitly add limit: 255 to the old migrations (branch schema-limit-255)
  2. [subject to vote] Add a new migration that changes the column types to remove the limit
  3. Archive the old migrations

(If we do 2, then I prefer to do it before 3, because it's silly for the schema snapshot to be littered with limit: 255 only to have the next migration remove them.)

@Holmes98
Copy link
Member Author

Holmes98 commented Jan 7, 2024

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants