-
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FS corruption after hibernation #250
Comments
|
First issue is actually exactly why I added |
FYI, to prevent the same problem in systemd initrd, you need your service to have |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The BTRFS script provided in the README.org has some problems that could result in filesytem corruption or data loss.
Filesystem corruption.
This script runs both during the normal boot and when resuming from hibernation.
Mounting BTRFS right before resuming is equivalent to mounting the same filesystem twice at the same time (without proper safeguards).
I had my filesystem corrupted and turned into an unmountable/unbootable state because of this.
The solution: instead of
postDeviceCommands
, usepostResumeCommands
. Seestage-1-init.sh
source for the difference.Also, see "How to fix the FS corruption after hibernation" below.
Wiping the wrong subvolumes.
The function
delete_subvolume_recursively
might delete all the subvolumes on a disk (not just a particular subdirectory of/old_roots
).This can occur if
old_roots
and its children are plain directories rather than subvolumes, which can happen after restoring disk contents from a backup.The reason is that the
-o
option inbtrfs subvolume list
prints the subvolumes of the given parent subvolume, not the subvolumes of the given directory.To illustrate:
In this case,
delete_subvolume_recursively /old_roots/2024-11-20_05:35:10
will happily delete/root
,/nix
, and/persistent
.Possible solutions:
--recursive
flag instead of awkwardly unreliably bash function. Tho this flag is added very recently (v6.12, 2024-11-29) and not available in NixOS 24.11 which has v6.11.rm -rf
. Starting from the kernel v4.17,rm
could be used to delete subvolumes (see BTRFS changelog). Might be slower thanbtrfs subvolume delete
.-o
flag and filter subvolumes on our side usingawk
or bash string manipulation.Combining the first two solutions:
find -mtime +30
filters based on a boot time, not the "last used" time.Therefore, if you reboot after an uptime of more than one month, you'll end up with an empty
/old_roots
.Also, the same if your system clock got reset (e.g., CMOS battery is dead).
Solutions:
(not a corruption/data loss, but a slight inconvenience)
I'm not sure why the date format is
+%Y-%m-%-d_%H:%M:%S
with%-d
and not with%d
.This makes lexically sorting unusable, e.g.:
2025-01-1
,2025-01-20
,2025-01-3
.How to fix the FS corruption after hibernation
In my case, the situation was similar to this SO thread, with these errors in
dmesg
(citing from SO):What did help:
ro
mode, copy the most important data to another disk and then recreate the filesystem from scratch.If you are lucky, it might work until you stumble upon a corrupted file. If not, see the next step.
btrfs restore
to copy the data, then recreate FS from scratch.What did not help: much everything else.
After copying the data back, make sure that
/old_roots
contains only subvolumes, otherwise the script will wipe your drive (see "Wiping the wrong subvolumes" above).The text was updated successfully, but these errors were encountered: