Allows patterns like "svol.2019*".
Needs @exclude_vf to be a global variable, which is ok as this only
holds command-line args (and global context for archive_exclude).
Note that adding the same in "create backups" (action run) and
macro_delete() is not a good idea, as this has weird implications on
the "forced: last" snapshot/backup pair.
Note that any action (e.g. btrfs send -p) on a subvolume with unsafe
characters will still fail with:
ERROR: Unsafe command [...] (offending string: "/tmp/btrfs_pool/svol with spaces")
Features:
- get mounted filesystems from /proc/self/mountinfo
- fetch subvolumes using "btrfs subvolume list" (fast, needs root)
- filter and print subvolumes below mount point
Add run_cmd option stream_buffer_sink, which handles stream_buffer,
rate_limit as well as --progress.
For rate limiting, run "mbuffer" (on the target host) in combination
with stream_buffer and --progress, instead of running "pv" (on the
source host).
Reasons:
- mbuffer limits the read rate: For remote targets, we want a stream
buffer in front of the rsh command pipe, before decompression.
- For local targets, this can be combined with --process.
- Combined stream_buffer and rate_limit: less commands in pipe.
Further changes:
- always set mbuffer -v1 option (never show warnings)
- restrict raw_target_block_size to "kmgKMG": compatibility to
stream_compress and rate_limit options, simplicity.
- use mbuffer blocksize option where applicable
Use $resolve_droot instead of $droot for calls to get_best_correlated
(probably missed commit), same as $resolve_sroot.
Fixes possible regression of:
514e69243a btrbk: add "incremental_resolve" configuration option
For raw targets, get_best_parent() dies as VINFO_MOUNTPOINT is not
defined on raw vinfo.
Fixes regression of:
d64e237e94 btrbk: get_best_parent: consider all parent/child relations
It's not uncommon to have a large intact parent-chain on targets
(e.g. target_preserve_min=all).
If this is the case, performance drops a bit on "btrbk archive".
Note that we could limit the search depth in get_best_parent() for
some performance improvements, as this only affects extra clones.
Perl hates recursions, and dies if recursion depth = 100:
Deep recursion on subroutine "main::_push_related_children"
Unfortunately this happens before the implemented abort condition
(distance=256).
Fixed by re-implementing get_related_readonly_nodes() non-recursive.
Refs: https://github.com/digint/btrbk/issues/279
With this, previous snapshots (far relations) are still listed when
restoring a snapshot.
Example (S = source subvolume, readwrite):
After 3 snapshots:
A->S, B->S, C->S
Restore B: `btrfs subvol delete S; btrfs subvol snapshot B S'`
A->S, B->S, C->S, S'->B
Previous implementation would show now snapshots for S', as no
snapshot has parent_uuid=S'.
New implementation shows A, B, C as snapshots for S', as orphaned
siblings (A, B, C pointing to deleted S) are also related.
Makes sure that if, for whatever reason, a subvolume having correct
btrbk name scheme does NOT share any extents with previous snapshots
is never used as parent.
Note that if a related parent is found, the unrelated closest
older/newer (by btrbk timestamp) subvolumes are still added as clone
sources.
Preferences for parent (and required clone sources):
1. closest older in snapdir (by btrbk timestamp), related
2. closest older related (by cgen)
3. closest newer related (by cgen)
4. closest older in snapdir (by btrbk timestamp)
5. closest newer in snapdir (by btrbk timestamp)
Note: prefering 1 over 2 helps keeping parent-chain within droot on
target (assuming that btrfs always uses correlated parent on
btrfs-receive).
This will e.g. add a clone source on "btrbk resume", if both older AND
newer snapshot/backup pairs exists.
Also makes sure that the closest older btrbk snapshot is always added
as clone source, even if another related subvolume has newer cgen.
Old implementation was missing last readonly parent in chain, as well
as orphaned siblings.
Also sort all by cgen, not by distance, then cgen.
Also skip self.
Allowed values for "incremental_resolve":
- "mountpoint" (default): Use parents in the filesystem tree below
mount points of source `<volume-directory>/<snapshot-dir>` and
target `<target-directory>`.
- "directory": Use parents strictly below source/target
directories. Useful when restricting access, e.g. when using
ssh_filter_btrbk.sh.
- "_all_accessible" (experimental): Use parents from all mount points.
Note that using "_all_accessible" causes btrfs-progs to fail:
- btrfs send -p: "ERROR: not on mount point: /path/to/mountpoint"
- btrfs receive: "ERROR: parent subvol is not reachable from inside the root subvol"
see also: https://github.com/kdave/btrfs-progs/issues/96
Build check hash within btr_tree node instead per URL. This makes it
aware of shared btr_tree (different hostname:port pointing to same
btrfs filesystem).
Common virtual machine setups have multiple volume sections with same
host, but distinct port numbers for each machine.
- make caches dependent on MACHINE_ID instead of HOST
- append port number to URL
- add MACHINE_ID to vinfo
- use MACHINE_ID where applicable
This even works if virtual machines share the same btrfs filesystems:
If a equal UUID is found on distinct machines, btr_tree() will return
the already present tree, in order to be consistent after node
injections.
Setting the ssh port directly in the "volume" / "target" config lines
adds the possibility to have a create a unique "hostname:port"
identifier (preparatory for MACHINE_ID to distinguish virtual machines
on same host with different ports.)
When called from another script, we dont want the help message printed
on errors. E.g. when running something like:
btrbk list snapshots -q filter_which_does_not_match
When configuring "target" in a global (or "volume") context, and
overriding target_preserve_min in "subvolume" section, the scheduler
has undefined behavior (mixing up the "min" values).
Fixed by returning a copy of the preserve hash in
config_preserve_hash().
It is possible that the subvolume path is not accessible by the user
calling btrbk. When resolving mount points, "readlink" is used on the
path, which also needs to be wrapped with "sudo".
The FORCE_PRESERVE information is set on the node, and was lost for
"latest common target" as get_receive_targets() returned vinfo without
node information.
fixes regression: 6c502cb btrbk: search complete target tree for correlated subvolumes
Since we consider all accessible subvolumes in get_related_subvolumes,
checking for equal BTRBK_BASENAME and empty SUBVOL_DIR does not work
when checking for same btrbk file name scheme.
fixes regression: b37ef84e36 (btrbk: always read mountpoints; include all snapshots from mountpoint as candidates for best common parent)
Improve error handling in btrfs_send_receive: on error, always try to
read the target subvolume and only delete it automatically if it is
garbled (read/write, no received_uuid).
This is especially important if the target subvolume was already
present before send/receive.
Reverts: 4c4afe77 btrbk: skip target metadata test if send/receive has errors
If btrfs_subvolume_show($vol, rootid => 5) fails, there are no
"received_uuid" and no "gen" keys in the root node.
Fixes: 0acbf74c57 (btrbk: add btrfs_subvolume_list_complete: fetch all subvolumes with all flags)
As we allow <url> to be specified as "<hostname>:<directory>", an URL
"ssh://my.host" (without trailing slash) was parsed as hostname="ssh",
directory="/my.host".
Wrapper, returns complete list of all subvolumes (including btrfs
root, id=5) with all flags. Requires three calls to btrfs-progs.
Adaptions and cleanup in btr_tree().
Btrfs root subvolume (id=5) have no UUID and cannot be backed
up. Abort if "subvolume ." is configured on btrfs root, e.g.:
volume /path/to/btrfs_root
subvolume .
Note that the UUID for btrfs root (id=5) is not always present:
- btrfs-progs < 4.12 does not support rootid lookup
- UUID can be missing if filesystem was created with btrfs-progs < 4.16
Still we need to always read it, as the whole tree is cached and we
don't know if it will be used.
Filesystems created with btrfs-progs < 4.16 have valid UUID, while
others have not [1]. Validate output of "btrfs subvolume show", and
provide uuid for btrfs root (id=5) only if it is valid.
[1]: 0a0a03554a: btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of, FS_TREE
Many people use whitespace even in mountpoints, silently ignore
(loglevel=info) non-parseable btrfs mountpoints.
Btrbk does not support file names with whitespace or special
characters by design, and specifying such mountpoints in the
configuration file fails anyway.
Instead of passing snapshot_dir all over the place, use a separate
vinfo for the snapshot directory, accessible by vinfo_snapshot_root().
As it is initialized separately by vinfo_init_root(), it can be on a
different mountpoint.
This also allows us to use different semantics for snapshot_dir in the
future, as it does not need to be relative to the volume directory.
Dropped readin of subvolid and realpath by btrfs_subvolume_show(), we
now always read /proc/self/mounts (and call readlink).
When picking the best common parent in get_best_parent(), we want to
list as many snapshots as possible. For now, we list all from the
mountpoint of snaproot ($sroot/<snapshot_dir>), due to a bug in
btrfs-progs [1]. Also added code (commented out) to list snapshots
from all known mountpoints.
[1] https://github.com/kdave/btrfs-progs/issues/96
- move matching for correlated subvolumes from get_receive_targets
into new function _receive_target_nodes
- add lookup tables in btr_tree (RECEIVED_UUID_HASH, UUID_HASH),
allowing for faster matching in _receive_target_nodes
- add vinfo_resolved() for mapping nodes to vinfo
- rename get_latest_common to get_best_parent (while moving some
functionality to new function get_related)
- cleanup
In the scheduler, a month (or year) does not start at the first day,
but at the first `preserve_day_of_week`. Make sure that all days
before `preserve_day_of_week` in a month get delta_months+1.
Example (corner case):
- `preserve_day_of_week sunday`
- `target_preserve *m`
- no backups in 2018-02
- backup with timestamp 2018-03-01 (which is a thursday)
- backup with timestamp 2018-03-04 (which is a sunday)
Without this patch, because there are no sunday backups in 2018-02,
the first backup is considered a weekly (+4d after sunday), and as
such "first weekly of month 2018-03", and the second one is discarded.
With this patch, the first item is considered "first weekly of month
2018-02", and the second gets "first weekly of month 2018-03".
NOTE: This change may result in (previously preserved) backups to be
deleted!
Snapshots and backups having no exact time information (created with
"timestamp_format=short") are set to 00:00, which would be regarded as
"previous day" if preserve_hour_of_day is greater than 0. Fix this by
ignoring preserve_hour_of_day in this case.
Introduces the new config option "preserve_hour_of_day" to specify
after what time backups should be considered as dailies.
Based on pull request #204, with changes:
- calculation of weekly backups
- change format of preserve_matrix
Suppress "Option redefined" warning for snapshot_name config option,
which has hardcoded (computed) default already set when checking.
fix regression: 0ebe2ea2e1
Similar to ABORTED=USER_SKIP (active commandline filter), archives
having ABORTED=ARCHIVE_EXCLUDE_SKIP (active archive_exclude
configuration) do not cause exit status 10 and are hidden from
transaction log.