-
Problems have been discovered that could have allowed the creation of incomplete trees or commits, for example
bup save
orbup get
could have created saves with missing data. This should no longer be possible, but any existing incomplete trees might also be re-used bybup get
(for example), and so represent a continuing hazard. Note that if you've never usedbup gc
orbup get
, then we don't currently believe your repositories could have been affected.You can detect whether you've been affected by running
bup-validate-object-links(1)
. If it doesn't report any broken links (asno HASH for PARENT_HASH
), then you can stop here, the repository should be fine. You can also runbup midx -af
first, which may speed up the validation.If it does report broken links, then you should run
bup gc --ignore missing
to completion before making any further additions to the repository. But first, if you have other repositories that might still contain the missing objects, then you may want to try to retrieve them. See "repopulate missing objects" below for details.bup gc --ignore-missing
will eliminate some of the hazards and report at least one of the paths to each missing object.If
gc
doesn't report any broken paths (missing objects), then you can stop here, the repository should be fine.If
gc
does report broken paths, then you should clear the related indexes, e.g.bup index --clear
orbup on HOST index --clear
, etc.If you don't rely on
bup get
(e.g. if you onlysave
) then clearing the index(es) should ensure that new saves will be complete (though existing broken saves will remain structurally broken for now).If you do rely on
bup get
, and ifgc
reports broken paths, then there's not yet an easy way to ensureget
won't continue to re-use incomplete trees when building new saves. For now, you could start a new repository (and save the old one for future repairs).You can try to repopulate missing objects from the source (or some other) repository. To do so, you can collect a list of missing objects via
bup validate-object-links
:bup validate-object-links | tee validate-out grep -E '^no ' validate-out | cut -d' ' -f 2 | sort -u > missing-objects sed -e 's/^/--unnamed git:/' missing-objects > unnamed-objects
and then try to retrieve the missing objects from another repository via
bup get
. For example, perhaps:xargs bup get --source repo --unnamed --ignore-missing < unnamed-objects
or
xargs bup on HOST get --unnamed --ignore-missing < unnamed-objects
After that, you can run
bup validate-object-links
to see whether you were able to fix all of the broken references (i.e. whether it still reports missing objects).If you have enough missing objects, it's possible xargs might split the argument list between
--unnamed
and its argument, causingget
to fail. If so, you can just specify an even numbered value forxargs -n
, for examplexargs -n 64 bup get ...
. On most systems, you can choose a much largern
.If you would just like to validate some saves, you can now run
bup validate-ref-links SAVE...
which should be much more efficient than attempting a restore or joining the saves to /dev/null.We're also working on a command that will repair the structure of any existing broken trees so that commands like restore will still be able to work with them.
See issue/missing-objects.md for a detailed explanation of the problem. If you have pandoc and graphviz dot installed, this will be rendered to issue/missing-objects.html which you can open in a browser, or you can find it here.
-
bup validate-object-links
has been added. This command scans the objects in the repository and reports any "broken links" it finds, i.e. any links from a tree or commit in the repository to an object that doesn't exist. -
bup validate-ref-links
has been added. This command traverses repository references (e.g. saves) and logs paths to missing objects, i.e. references from a tree or commit to an object that doesn't exist in the repository. At the moment, it will report at least one path to each missing object; it does not attempt to find all of the paths. -
bup gc
now provides--ignore-missing
which allows agc
operation to continue after encountering objects that are missing from the repository. -
bup join
now reports the path to any missing object it encounters.
-
bup gc
should no longer risk leaving the repository with incomplete tree or commit objects -- trees or commits with references to objects that are no longer in the repository.Previously this could happen because the collection was probabilistic with respect to all object types, and so it could leave (completely orphaned) vestigial commits or trees that referred to objects that had been removed. It could also do this if the
--threshold
caused it to keep a parent in one "live enough" pack, while discarding a descendant in a pack that doesn't cross the threshold.These objects can cause serious trouble because they can be re-used as-is (without noticing that they are incomplete) by other commands like
bup get
. -
bup get
should no longer be able to leave the repository with incomplete trees or commits if it's interrupted at the wrong time. Previously it fetched objects "top down", and so if it was interrupted after the parent tree/commit was written to the repository, but before all the children were written, then the repository would be left with an incomplete tree. -
bup
should always ignore midx files that refer to missing indexes. Previously it might not notice when objects had disappeared (viagc
) which could, in particular, cause remote/client operations like a remote save to decide that the repository already contained data that it did not. -
bup midx
--auto
and--force
now delete midx files that refer to missing indexes. -
bup gc
should no longer throw bloom close-related exceptions when interrupted.
- Graphviz
dot
is optional, but must be available in order to render the figures referred to by issue/missing-objects.md.
Greg Troxel, Johannes Berg, and Rob Browning