-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backfill teams CE in a single transaction #4985
Conversation
@@ -5,10 +5,7 @@ defmodule Plausible.DataMigration.BackfillTeams do | |||
|
|||
import Ecto.Query | |||
|
|||
alias Plausible.Teams | |||
|
|||
@repo Plausible.DataMigration.PostgresRepo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-hosters (by default) only have one DB. So it doesn't seem necessary to start a separate pool with a different URL, especially since the writes were going to the default DB anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a dynamic repo can be used in the future if there is a need to run all DB ops through a specific, data-migration-only repo. But since it's stored in the process dictionary, it would need extra care for multi-process migrations like in the original script with Task.async_stream
|
||
def up do | ||
if Plausible.ce?() do | ||
Plausible.DataMigration.BackfillTeams.run(dry_run?: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way there are fewer manual steps for self-hosters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the output from my local instance
15:28:50.421 [info] == Running 20250117122435 Plausible.Repo.Migrations.BackfillTeams.up/0 forward
15:28:50.438 [debug] QUERY OK source="teams" db=5.2ms
SELECT t0."id", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."grace_period", t0."inserted_at", t0."updated_at" FROM "teams" AS t0 LEFT OUTER JOIN "team_memberships" AS t1 ON t1."team_id" = t0."id" LEFT OUTER JOIN "sites" AS s2 ON s2."team_id" = t0."id" WHERE (t1."id" IS NULL) AND (s2."id" IS NULL) []
[2025-01-17 12:28:50Z] Found 0 orphaned teams...
[2025-01-17 12:28:50Z] Deleted orphaned teams
15:28:50.448 [debug] QUERY OK source="sites" db=4.8ms
SELECT s0."id", s0."domain", s0."timezone", s0."public", s0."locked", s0."stats_start_date", s0."native_stats_start_at", s0."allowed_event_props", s0."conversions_enabled", s0."props_enabled", s0."funnels_enabled", s0."ingest_rate_limit_scale_seconds", s0."ingest_rate_limit_threshold", s0."domain_changed_from", s0."domain_changed_at", s0."imported_data", s0."team_id", s0."installation_meta", s0."inserted_at", s0."updated_at", u2."id", u2."email", u2."password_hash", u2."name", u2."last_seen", u2."theme", u2."email_verified", u2."previous_email", u2."notes", u2."totp_enabled", u2."totp_secret", u2."totp_token", u2."totp_last_used_at", u2."inserted_at", u2."updated_at" FROM "sites" AS s0 INNER JOIN "site_memberships" AS s1 ON s1."site_id" = s0."id" INNER JOIN "users" AS u2 ON u2."id" = s1."user_id" WHERE (s1."role" = 'owner') AND (s0."team_id" IS NULL) []
[2025-01-17 12:28:50Z] Found 1 sites without teams...
[2025-01-17 12:28:50Z] Teams about to be created: 1
[2025-01-17 12:28:50Z] Max sites: 1
15:28:50.453 [debug] QUERY OK source="team_memberships" db=0.4ms
SELECT t1."id", t1."name", t1."trial_expiry_date", t1."accept_traffic_until", t1."allow_next_upgrade_override", t1."grace_period", t1."inserted_at", t1."updated_at" FROM "team_memberships" AS t0 INNER JOIN "teams" AS t1 ON t1."id" = t0."team_id" WHERE ((t0."user_id" = $1) AND (t0."role" = 'owner')) ORDER BY t1."id" [1]
15:28:50.458 [debug] QUERY OK source="teams" db=0.9ms
INSERT INTO "teams" ("name","trial_expiry_date","accept_traffic_until","inserted_at","updated_at","allow_next_upgrade_override") VALUES ($1,$2,$3,$4,$5,$6) RETURNING "id" ["My Team", ~D[2125-01-17], ~D[2125-01-31], ~N[2024-11-28 08:18:53], ~N[2024-11-28 08:18:53], false]
15:28:50.460 [debug] QUERY OK source="team_memberships" db=1.3ms
INSERT INTO "team_memberships" ("role","inserted_at","updated_at","team_id","user_id") VALUES ($1,$2,$3,$4,$5) ON CONFLICT (user_id) WHERE role != 'guest' DO NOTHING RETURNING "id" [:owner, ~N[2024-11-28 08:18:53], ~N[2024-11-28 08:18:53], 1, 1]
15:28:50.461 [debug] QUERY OK source="teams" db=0.7ms
UPDATE "teams" SET "trial_expiry_date" = $1, "updated_at" = $2 WHERE "id" = $3 [nil, ~N[2024-11-28 08:18:53], 1]
.
15:28:50.466 [debug] QUERY OK source="sites" db=3.2ms
UPDATE "sites" AS s0 SET "team_id" = $1 WHERE (s0."id" = ANY($2)) [1, [1]]
[2025-01-17 12:28:50Z] Backfilled 1 teams.
[2025-01-17 12:28:50Z] Found 1 users on trial without team...
15:28:50.468 [debug] QUERY OK source="users" db=0.3ms
SELECT u0."id", u0."email", u0."password_hash", u0."name", u0."last_seen", u0."theme", u0."email_verified", u0."previous_email", u0."notes", u0."totp_enabled", u0."totp_secret", u0."totp_token", u0."totp_last_used_at", u0."inserted_at", u0."updated_at" FROM "users" AS u0 WHERE (NOT (u0."trial_expiry_date" IS NULL)) AND (NOT (exists((SELECT st0."id", st0."role", st0."user_id", st0."team_id", st0."inserted_at", st0."updated_at" FROM "team_memberships" AS st0 WHERE (st0."role" = 'owner') AND (st0."user_id" = u0."id"))))) []
15:28:50.469 [debug] QUERY OK source="team_memberships" db=0.3ms
SELECT t1."id", t1."name", t1."trial_expiry_date", t1."accept_traffic_until", t1."allow_next_upgrade_override", t1."grace_period", t1."inserted_at", t1."updated_at" FROM "team_memberships" AS t0 INNER JOIN "teams" AS t1 ON t1."id" = t0."team_id" WHERE ((t0."user_id" = $1) AND (t0."role" = 'owner')) ORDER BY t1."id" [2]
15:28:50.469 [debug] QUERY OK source="teams" db=0.2ms
INSERT INTO "teams" ("name","trial_expiry_date","accept_traffic_until","inserted_at","updated_at","allow_next_upgrade_override") VALUES ($1,$2,$3,$4,$5,$6) RETURNING "id" ["My Team", ~D[2125-01-17], ~D[2125-01-31], ~N[2024-11-28 08:18:56], ~N[2024-11-28 08:18:56], false]
[2025-01-17 12:28:50Z] Created teams for all users on trial without a team.
15:28:50.469 [debug] QUERY OK source="team_memberships" db=0.3ms
INSERT INTO "team_memberships" ("role","inserted_at","updated_at","team_id","user_id") VALUES ($1,$2,$3,$4,$5) ON CONFLICT (user_id) WHERE role != 'guest' DO NOTHING RETURNING "id" [:owner, ~N[2024-11-28 08:18:56], ~N[2024-11-28 08:18:56], 2, 2]
[2025-01-17 12:28:50Z] Found 0 guest memberships with mismatched team to remove...
15:28:50.471 [debug] QUERY OK source="guest_memberships" db=1.7ms
SELECT g0."id", g0."role", g0."team_membership_id", g0."site_id", g0."inserted_at", g0."updated_at" FROM "guest_memberships" AS g0 INNER JOIN "team_memberships" AS t1 ON t1."id" = g0."team_membership_id" INNER JOIN "sites" AS s2 ON s2."id" = g0."site_id" WHERE (t1."team_id" != s2."team_id") []
[2025-01-17 12:28:50Z] Pruning guest team memberships for 0 teams...
15:28:50.472 [debug] QUERY OK source="guest_memberships" db=0.3ms
DELETE FROM "guest_memberships" AS g0 USING "team_memberships" AS t1 WHERE (t1."id" = g0."team_membership_id") AND (g0."id" = ANY($1)) RETURNING t1."team_id" [[]]
[2025-01-17 12:28:50Z] Guest memberships with mismatched team cleared.
15:28:50.473 [debug] QUERY OK source="teams" db=0.2ms
SELECT t0."id", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."grace_period", t0."inserted_at", t0."updated_at" FROM "teams" AS t0 WHERE (t0."id" = ANY($1)) [[]]
[2025-01-17 12:28:50Z] Found 0 guest memberships to remove...
15:28:50.473 [debug] QUERY OK source="guest_memberships" db=0.4ms
SELECT g0."id", g0."role", g0."team_membership_id", g0."site_id", g0."inserted_at", g0."updated_at" FROM "guest_memberships" AS g0 INNER JOIN "team_memberships" AS t1 ON t1."id" = g0."team_membership_id" WHERE (NOT (exists((SELECT 1 FROM "site_memberships" AS ss0 WHERE (ss0."site_id" = g0."site_id") AND (ss0."user_id" = t1."user_id") AND (ss0."role" != 'owner'))))) []
[2025-01-17 12:28:50Z] Pruning guest team memberships for 0 teams...
15:28:50.474 [debug] QUERY OK source="guest_memberships" db=0.3ms
DELETE FROM "guest_memberships" AS g0 USING "team_memberships" AS t1 WHERE (t1."id" = g0."team_membership_id") AND (g0."id" = ANY($1)) RETURNING t1."team_id" [[]]
[2025-01-17 12:28:50Z] Guest memberships cleared.
15:28:50.474 [debug] QUERY OK source="teams" db=0.2ms
SELECT t0."id", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."grace_period", t0."inserted_at", t0."updated_at" FROM "teams" AS t0 WHERE (t0."id" = ANY($1)) [[]]
[2025-01-17 12:28:50Z] Found 1 site memberships without guest membership...
15:28:50.476 [debug] QUERY OK source="site_memberships" db=0.9ms
SELECT u3."id", u3."email", u3."password_hash", u3."name", u3."last_seen", u3."theme", u3."email_verified", u3."previous_email", u3."notes", u3."totp_enabled", u3."totp_secret", u3."totp_token", u3."totp_last_used_at", u3."inserted_at", u3."updated_at", s1."id", s1."domain", s1."timezone", s1."public", s1."locked", s1."stats_start_date", s1."native_stats_start_at", s1."allowed_event_props", s1."conversions_enabled", s1."props_enabled", s1."funnels_enabled", s1."ingest_rate_limit_scale_seconds", s1."ingest_rate_limit_threshold", s1."domain_changed_from", s1."domain_changed_at", s1."imported_data", s1."team_id", s1."installation_meta", s1."inserted_at", s1."updated_at", t2."id", t2."name", t2."trial_expiry_date", t2."accept_traffic_until", t2."allow_next_upgrade_override", t2."grace_period", t2."inserted_at", t2."updated_at", s0."inserted_at", s0."updated_at", s0."role" FROM "site_memberships" AS s0 INNER JOIN "sites" AS s1 ON s1."id" = s0."site_id" INNER JOIN "teams" AS t2 ON t2."id" = s1."team_id" INNER JOIN "users" AS u3 ON u3."id" = s0."user_id" WHERE (s0."role" != 'owner') AND (NOT (exists((SELECT 1 FROM "guest_memberships" AS sg0 INNER JOIN "team_memberships" AS st1 ON st1."id" = sg0."team_membership_id" WHERE (sg0."site_id" = s0."site_id") AND (st1."user_id" = s0."user_id"))))) []
[2025-01-17 12:28:50Z] Team memberships to be created: 1
[2025-01-17 12:28:50Z] Max guest memberships: 1
15:28:50.479 [debug] QUERY OK source="team_memberships" db=0.3ms
INSERT INTO "team_memberships" AS t0 ("role","inserted_at","updated_at","team_id","user_id") VALUES ($1,$2,$3,$4,$5) ON CONFLICT ("team_id","user_id") DO UPDATE SET "updated_at" = $6 RETURNING "id" [:guest, ~N[2024-11-28 08:18:56], ~N[2024-11-28 08:18:56], 1, 2, ~N[2024-11-28 08:18:56]]
.
15:28:50.480 [debug] QUERY OK source="guest_memberships" db=0.5ms
INSERT INTO "guest_memberships" ("role","inserted_at","updated_at","site_id","team_membership_id") VALUES ($1,$2,$3,$4,$5) RETURNING "id" [:viewer, ~N[2024-11-28 08:18:56], ~N[2024-11-28 08:18:56], 1, 3]
[2025-01-17 12:28:50Z] Backfilled missing guest memberships.
[2025-01-17 12:28:50Z] Found 0 guest memberships with role out of sync...
15:28:50.481 [debug] QUERY OK source="site_memberships" db=0.4ms
SELECT g2."id", g2."role", g2."team_membership_id", g2."site_id", g2."inserted_at", g2."updated_at", s0."role" FROM "site_memberships" AS s0 INNER JOIN "team_memberships" AS t1 ON t1."user_id" = s0."user_id" INNER JOIN "guest_memberships" AS g2 ON g2."site_id" = s0."site_id" WHERE (t1."role" = 'guest') AND (((g2."role" = 'viewer') AND (s0."role" = 'admin')) OR ((g2."role" = 'editor') AND (s0."role" = 'viewer'))) []
[2025-01-17 12:28:50Z] All guest memberships are up to date now.
[2025-01-17 12:28:50Z] Found 0 guest invitations to remove...
15:28:50.486 [debug] QUERY OK source="guest_invitations" db=3.8ms
SELECT g0."id", g0."invitation_id", g0."role", g0."site_id", g0."team_invitation_id", g0."inserted_at", g0."updated_at" FROM "guest_invitations" AS g0 INNER JOIN "team_invitations" AS t1 ON t1."id" = g0."team_invitation_id" WHERE (NOT (exists((SELECT TRUE FROM "invitations" AS si0 WHERE (si0."site_id" = g0."site_id") AND (si0."email" = t1."email") AND (((si0."role" = 'viewer') AND (g0."role" = 'viewer')) OR ((si0."role" = 'admin') AND (g0."role" = 'editor'))))))) []
[2025-01-17 12:28:50Z] Pruning guest team invitations for 0 teams...
15:28:50.487 [debug] QUERY OK source="guest_invitations" db=0.3ms
DELETE FROM "guest_invitations" AS g0 USING "team_invitations" AS t1 WHERE (t1."id" = g0."team_invitation_id") AND (g0."id" = ANY($1)) RETURNING t1."team_id" [[]]
[2025-01-17 12:28:50Z] Guest invitations cleared.
15:28:50.487 [debug] QUERY OK source="teams" db=0.2ms
SELECT t0."id", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."grace_period", t0."inserted_at", t0."updated_at" FROM "teams" AS t0 WHERE (t0."id" = ANY($1)) [[]]
[2025-01-17 12:28:50Z] Found 0 site invitations without guest invitation...
15:28:50.489 [debug] QUERY OK source="invitations" db=0.7ms
SELECT i0."inserted_at", i0."updated_at", i0."role", i0."invitation_id", i0."email", s1."id", s1."domain", s1."timezone", s1."public", s1."locked", s1."stats_start_date", s1."native_stats_start_at", s1."allowed_event_props", s1."conversions_enabled", s1."props_enabled", s1."funnels_enabled", s1."ingest_rate_limit_scale_seconds", s1."ingest_rate_limit_threshold", s1."domain_changed_from", s1."domain_changed_at", s1."imported_data", s1."team_id", s1."installation_meta", s1."inserted_at", s1."updated_at", t2."id", t2."name", t2."trial_expiry_date", t2."accept_traffic_until", t2."allow_next_upgrade_override", t2."grace_period", t2."inserted_at", t2."updated_at", u3."id", u3."email", u3."password_hash", u3."name", u3."last_seen", u3."theme", u3."email_verified", u3."previous_email", u3."notes", u3."totp_enabled", u3."totp_secret", u3."totp_token", u3."totp_last_used_at", u3."inserted_at", u3."updated_at" FROM "invitations" AS i0 INNER JOIN "sites" AS s1 ON i0."site_id" = s1."id" INNER JOIN "teams" AS t2 ON t2."id" = s1."team_id" INNER JOIN "users" AS u3 ON u3."id" = i0."inviter_id" WHERE (i0."role" != 'owner') AND (NOT (exists((SELECT 1 FROM "guest_invitations" AS sg0 INNER JOIN "team_invitations" AS st1 ON st1."id" = sg0."team_invitation_id" WHERE (sg0."site_id" = i0."site_id") AND (st1."email" = i0."email"))))) []
[2025-01-17 12:28:50Z] Backfilled missing guest invitations.
[2025-01-17 12:28:50Z] Found 0 guest invitations with role out of sync...
15:28:50.490 [debug] QUERY OK source="invitations" db=0.3ms
SELECT g2."id", g2."invitation_id", g2."role", g2."site_id", g2."team_invitation_id", g2."inserted_at", g2."updated_at", i0."role", i0."invitation_id" FROM "invitations" AS i0 INNER JOIN "team_invitations" AS t1 ON t1."email" = i0."email" INNER JOIN "guest_invitations" AS g2 ON (g2."team_invitation_id" = t1."id") AND (g2."site_id" = i0."site_id") WHERE (t1."role" = 'guest') AND ((((g2."role" = 'viewer') AND (i0."role" = 'admin')) OR ((g2."role" = 'editor') AND (i0."role" = 'viewer'))) OR g2."invitation_id" IS DISTINCT FROM i0."invitation_id") []
[2025-01-17 12:28:50Z] All guest invitations are up to date now.
[2025-01-17 12:28:50Z] Found 0 site transfers to remove...
15:28:50.491 [debug] QUERY OK source="team_site_transfers" db=0.7ms
SELECT t0."id", t0."transfer_id", t0."email", t0."transfer_guests", t0."site_id", t0."initiator_id", t0."destination_team_id", t0."inserted_at", t0."updated_at" FROM "team_site_transfers" AS t0 WHERE (NOT (exists((SELECT TRUE FROM "invitations" AS si0 WHERE (si0."site_id" = t0."site_id") AND (si0."email" = t0."email") AND (si0."role" = 'owner'))))) []
[2025-01-17 12:28:50Z] Site transfers cleared.
15:28:50.492 [debug] QUERY OK source="team_site_transfers" db=0.2ms
DELETE FROM "team_site_transfers" AS t0 WHERE (t0."id" = ANY($1)) [[]]
[2025-01-17 12:28:50Z] Found 0 ownership transfers without site transfer...
15:28:50.493 [debug] QUERY OK source="invitations" db=0.5ms
SELECT i0."email", i0."role", i0."invitation_id", i0."inserted_at", i0."updated_at", s1."id", s1."domain", s1."timezone", s1."public", s1."locked", s1."stats_start_date", s1."native_stats_start_at", s1."allowed_event_props", s1."conversions_enabled", s1."props_enabled", s1."funnels_enabled", s1."ingest_rate_limit_scale_seconds", s1."ingest_rate_limit_threshold", s1."domain_changed_from", s1."domain_changed_at", s1."imported_data", s1."team_id", s1."installation_meta", s1."inserted_at", s1."updated_at", u2."id", u2."email", u2."password_hash", u2."name", u2."last_seen", u2."theme", u2."email_verified", u2."previous_email", u2."notes", u2."totp_enabled", u2."totp_secret", u2."totp_token", u2."totp_last_used_at", u2."inserted_at", u2."updated_at" FROM "invitations" AS i0 INNER JOIN "sites" AS s1 ON s1."id" = i0."site_id" INNER JOIN "users" AS u2 ON u2."id" = i0."inviter_id" WHERE (i0."role" = 'owner') AND (NOT (exists((SELECT 1 FROM "team_site_transfers" AS st0 WHERE (st0."site_id" = i0."site_id") AND (st0."email" = i0."email"))))) []
[2025-01-17 12:28:50Z] Backfilled missing site transfers.
[2025-01-17 12:28:50Z] All data are up to date now!
15:28:50.493 [info] == Migrated 20250117122435 in 0.0s
@repo.start(db_url, pool_size: 2 * @max_concurrency) | ||
|
||
backfill(dry_run?) | ||
Repo.transaction(fn -> backfill(dry_run?) end, timeout: :infinity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still keep it in a transaction just in case someone finds a way to run it outside of the migration.
) | ||
end, | ||
timeout: :infinity, | ||
max_concurrency: @max_concurrency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was probably meant for Task.async_stream
. If this PR is rejected, we would need to move it down, next to the second timeout: :infinity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 😂 well spotted ❤️
end) | ||
end | ||
|
||
defp translate_role("admin"), do: :editor | ||
defp translate_role("viewer"), do: :viewer | ||
|
||
defp log(msg) do | ||
IO.puts("[#{NaiveDateTime.utc_now(:second)}] #{msg}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-hosters on non-UTC instances might get confused by these timestamps that look like local but are actually UTC. DateTime
ones have Z
suffix that makes it clear that they're UTC.
@@ -122,7 +109,7 @@ defmodule Plausible.DataMigration.BackfillTeams do | |||
log("Pruning guest team memberships for #{length(team_ids_to_prune)} teams...") | |||
|
|||
from(t in Teams.Team, where: t.id in ^team_ids_to_prune) | |||
|> @repo.all(timeout: :infinity) | |||
|> Repo.all(timeout: :infinity) | |||
|> Enum.each(fn team -> | |||
Plausible.Teams.Memberships.prune_guests(team) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still an issue of using Ecto schemas and app-code (like this Plausible.Teams.Memberships.prune_guests
) which might change in the future and make the migration misbehave. But I don't see any easy fixes for this so I'm leaving it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it should change drastically enough (if ever) to break things here. We should be good.
|
b43976b
to
63b4a13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the changes make sense to me 👍
BTW about the CI failure - I might be a bit out of the loop - are we going to push this backfill and migration (and the consecutive release) in sync with the master branch or will we continue with cherry-picked changes?
I think backfill will go into the current master, yes. |
Context: #4925
I'm going to leave some comments inline.