Build check hash within btr_tree node instead per URL. This makes it
aware of shared btr_tree (different hostname:port pointing to same
btrfs filesystem).
Common virtual machine setups have multiple volume sections with same
host, but distinct port numbers for each machine.
- make caches dependent on MACHINE_ID instead of HOST
- append port number to URL
- add MACHINE_ID to vinfo
- use MACHINE_ID where applicable
This even works if virtual machines share the same btrfs filesystems:
If a equal UUID is found on distinct machines, btr_tree() will return
the already present tree, in order to be consistent after node
injections.
Setting the ssh port directly in the "volume" / "target" config lines
adds the possibility to have a create a unique "hostname:port"
identifier (preparatory for MACHINE_ID to distinguish virtual machines
on same host with different ports.)
When called from another script, we dont want the help message printed
on errors. E.g. when running something like:
btrbk list snapshots -q filter_which_does_not_match
When configuring "target" in a global (or "volume") context, and
overriding target_preserve_min in "subvolume" section, the scheduler
has undefined behavior (mixing up the "min" values).
Fixed by returning a copy of the preserve hash in
config_preserve_hash().
It is possible that the subvolume path is not accessible by the user
calling btrbk. When resolving mount points, "readlink" is used on the
path, which also needs to be wrapped with "sudo".
The FORCE_PRESERVE information is set on the node, and was lost for
"latest common target" as get_receive_targets() returned vinfo without
node information.
fixes regression: 6c502cb btrbk: search complete target tree for correlated subvolumes
Since we consider all accessible subvolumes in get_related_subvolumes,
checking for equal BTRBK_BASENAME and empty SUBVOL_DIR does not work
when checking for same btrbk file name scheme.
fixes regression: b37ef84e36 (btrbk: always read mountpoints; include all snapshots from mountpoint as candidates for best common parent)
Improve error handling in btrfs_send_receive: on error, always try to
read the target subvolume and only delete it automatically if it is
garbled (read/write, no received_uuid).
This is especially important if the target subvolume was already
present before send/receive.
Reverts: 4c4afe77 btrbk: skip target metadata test if send/receive has errors
If btrfs_subvolume_show($vol, rootid => 5) fails, there are no
"received_uuid" and no "gen" keys in the root node.
Fixes: 0acbf74c57 (btrbk: add btrfs_subvolume_list_complete: fetch all subvolumes with all flags)
As we allow <url> to be specified as "<hostname>:<directory>", an URL
"ssh://my.host" (without trailing slash) was parsed as hostname="ssh",
directory="/my.host".
Wrapper, returns complete list of all subvolumes (including btrfs
root, id=5) with all flags. Requires three calls to btrfs-progs.
Adaptions and cleanup in btr_tree().
Btrfs root subvolume (id=5) have no UUID and cannot be backed
up. Abort if "subvolume ." is configured on btrfs root, e.g.:
volume /path/to/btrfs_root
subvolume .
Note that the UUID for btrfs root (id=5) is not always present:
- btrfs-progs < 4.12 does not support rootid lookup
- UUID can be missing if filesystem was created with btrfs-progs < 4.16
Still we need to always read it, as the whole tree is cached and we
don't know if it will be used.
Filesystems created with btrfs-progs < 4.16 have valid UUID, while
others have not [1]. Validate output of "btrfs subvolume show", and
provide uuid for btrfs root (id=5) only if it is valid.
[1]: 0a0a03554a: btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of, FS_TREE
Many people use whitespace even in mountpoints, silently ignore
(loglevel=info) non-parseable btrfs mountpoints.
Btrbk does not support file names with whitespace or special
characters by design, and specifying such mountpoints in the
configuration file fails anyway.
Instead of passing snapshot_dir all over the place, use a separate
vinfo for the snapshot directory, accessible by vinfo_snapshot_root().
As it is initialized separately by vinfo_init_root(), it can be on a
different mountpoint.
This also allows us to use different semantics for snapshot_dir in the
future, as it does not need to be relative to the volume directory.
Dropped readin of subvolid and realpath by btrfs_subvolume_show(), we
now always read /proc/self/mounts (and call readlink).
When picking the best common parent in get_best_parent(), we want to
list as many snapshots as possible. For now, we list all from the
mountpoint of snaproot ($sroot/<snapshot_dir>), due to a bug in
btrfs-progs [1]. Also added code (commented out) to list snapshots
from all known mountpoints.
[1] https://github.com/kdave/btrfs-progs/issues/96
- move matching for correlated subvolumes from get_receive_targets
into new function _receive_target_nodes
- add lookup tables in btr_tree (RECEIVED_UUID_HASH, UUID_HASH),
allowing for faster matching in _receive_target_nodes
- add vinfo_resolved() for mapping nodes to vinfo
- rename get_latest_common to get_best_parent (while moving some
functionality to new function get_related)
- cleanup
In the scheduler, a month (or year) does not start at the first day,
but at the first `preserve_day_of_week`. Make sure that all days
before `preserve_day_of_week` in a month get delta_months+1.
Example (corner case):
- `preserve_day_of_week sunday`
- `target_preserve *m`
- no backups in 2018-02
- backup with timestamp 2018-03-01 (which is a thursday)
- backup with timestamp 2018-03-04 (which is a sunday)
Without this patch, because there are no sunday backups in 2018-02,
the first backup is considered a weekly (+4d after sunday), and as
such "first weekly of month 2018-03", and the second one is discarded.
With this patch, the first item is considered "first weekly of month
2018-02", and the second gets "first weekly of month 2018-03".
NOTE: This change may result in (previously preserved) backups to be
deleted!
Snapshots and backups having no exact time information (created with
"timestamp_format=short") are set to 00:00, which would be regarded as
"previous day" if preserve_hour_of_day is greater than 0. Fix this by
ignoring preserve_hour_of_day in this case.
Introduces the new config option "preserve_hour_of_day" to specify
after what time backups should be considered as dailies.
Based on pull request #204, with changes:
- calculation of weekly backups
- change format of preserve_matrix
Suppress "Option redefined" warning for snapshot_name config option,
which has hardcoded (computed) default already set when checking.
fix regression: 0ebe2ea2e1
Similar to ABORTED=USER_SKIP (active commandline filter), archives
having ABORTED=ARCHIVE_EXCLUDE_SKIP (active archive_exclude
configuration) do not cause exit status 10 and are hidden from
transaction log.
While $vol->{URL} can contain "//" if volume="/" (intentionally, this
is an assembled path), the filter statements are sanitized using
check_url(). This means we need to match the filter statement against
check_url($vol->{URL}). Same applies to subvol.
Btrfs does not destroy qgroups when subvolumes are deleted (see
https://bugzilla.kernel.org/show_bug.cgi?id=91751). As a workaround
for this, btrbk can be configured to always destroy the corresponding
default qgroup "0/<subvol-id>" whenever a subvolume (snapshot, backup
or archive) is deleted.
Added configuration options:
- snapshot_qgroup_destroy
- target_qgroup_destroy
- archive_qgroup_destroy
When doing a batch delete (multiple deletes with one call to "btrfs
subvolume delete"), we want to know which subvolumes have failed. For
this, we need parse the error output.
On any parsing failure, we assume that nothing has been deleted, and
warn accordingly (forward compatibility).
Example:
Manually create a key:
# KEYFILE=/some/secure/place/btrbk.key
# dd if=/dev/urandom bs=1 count=32 | od -x -A n | tr -d "[:space:]" > $KEYFILE
btrbk.conf:
volume /mnt/btr_pool
incremental no
raw_target_encrypt openssl_enc
openssl_ciphername aes-256-cbc
openssl_iv_size 16 # NOTE: set to "no" if no IV is needed by the selected cipher
openssl_keyfile /some/secure/place/btrbk.key
subvolume home
target raw ssh://cloud.example.com/backup
While taint mode [1] is a nice feature of perl, e.g. it disallows
using variables (such as filenames from the config file) which were
not validated in system() commands, it also treats $PATH as insecure
(which inherently is, as perl cannot know who messed around with it).
[1] perlsec(1): http://perldoc.perl.org/perlsec.html
[2] perlrun(1): http://perldoc.perl.org/perlrun.html
Note that btrbk still does all taint checks, and can be run in taint
mode:
- by executing `perl -T /usr/sbin/btrbk`,
- or by changing the hashbang to: `!#/usr/bin/perl -T`.
Despite FATAL warnings are discouraged in perl and may break forward
compatibility [1], we still use it as btrbk is usually run as root and
we really want perl to die on programmatic errors.
[1] "perldoc warnings"
The "duration" column in the transaction log has proven to be
confusing to some users, especially on errors (e.g. "send-receive
ERROR 27" in issue #177). As it's not really necessary (duration can
be computed from the corresponding "starting" log entry), it's now
being dropped.
As of btrfs-progs-v4.12, the "btrfs subvolume show" command does not
print the full (absolute, resolved) path anymore [1]. Instead, it prints
the relative path to btrfs root (or "/" if it is the root).
The impact for btrbk is that we cannot fill our realpath_cache in
btrfs_subvolume_show() anymore. This is not fatal, but has the
following consequences:
- The "check for duplicate snapshot locations" may now miss
subvolumes specified by symlinks.
- If multiple "volume" sections point to the same subvolume (e.g. if
specified using symlinks) an additional "btrfs subvolume list" is
called. Note that the subvolume will still be recognized as
identical, and the btr_tree will not be rebuilt.
[1] btrfs-progs commit: b7df24aa5cddc4802b9938f56372b73869775cd9
This gets important when using an old backup disk as source.
In terms of btrfs send/receive, all subvolumes matching "uuid /
received_uuid" are valid backups.
Merged (amend) from pull request: #116
Verified by Axel Burri <axel@tty0.ch>
We set "--no-random-seed-file" because one of the btrbk
design principles is to not create any files unasked. Enabling
"--no-random-seed-file" creates ~/.gnupg/random_seed, and as
such depends on $HOME to be set correctly (think on running in
cron). From gpg2(1) man page:
--no-random-seed-file GnuPG uses a file to store its
internal random pool over invocations This makes random
generation faster; however sometimes write operations are not
desired. This option can be used to achieve that with the cost
of slower random generation.
We use "dd" instead of shell redirections, as it is common to have
special filesystems (like NFS, SMB, FUSE) mounted on the raw target
path. By using "dd" we make sure to write in reasonably large blocks
(default=128K), which is not always the case when using redirections
(e.g. "gpg > outfile" writes in 8K blocks).
Another approach would be to always pipe through "cat", which uses
st_blksize from fstat(2) (with a minimum of 128K) to determine the
block size.
- add sophisticated stream compression in run_cmd
- add special "compress" cmd_pipe item
- add special "redirect" cmd_pipe item:
use shell redirection instead of troublesome "dd of=".
- disable ssh_compression if stream_compression is set