- move matching for correlated subvolumes from get_receive_targets
into new function _receive_target_nodes
- add lookup tables in btr_tree (RECEIVED_UUID_HASH, UUID_HASH),
allowing for faster matching in _receive_target_nodes
- add vinfo_resolved() for mapping nodes to vinfo
- rename get_latest_common to get_best_parent (while moving some
functionality to new function get_related)
- cleanup
In the scheduler, a month (or year) does not start at the first day,
but at the first `preserve_day_of_week`. Make sure that all days
before `preserve_day_of_week` in a month get delta_months+1.
Example (corner case):
- `preserve_day_of_week sunday`
- `target_preserve *m`
- no backups in 2018-02
- backup with timestamp 2018-03-01 (which is a thursday)
- backup with timestamp 2018-03-04 (which is a sunday)
Without this patch, because there are no sunday backups in 2018-02,
the first backup is considered a weekly (+4d after sunday), and as
such "first weekly of month 2018-03", and the second one is discarded.
With this patch, the first item is considered "first weekly of month
2018-02", and the second gets "first weekly of month 2018-03".
NOTE: This change may result in (previously preserved) backups to be
deleted!
Snapshots and backups having no exact time information (created with
"timestamp_format=short") are set to 00:00, which would be regarded as
"previous day" if preserve_hour_of_day is greater than 0. Fix this by
ignoring preserve_hour_of_day in this case.
Introduces the new config option "preserve_hour_of_day" to specify
after what time backups should be considered as dailies.
Based on pull request #204, with changes:
- calculation of weekly backups
- change format of preserve_matrix
Suppress "Option redefined" warning for snapshot_name config option,
which has hardcoded (computed) default already set when checking.
fix regression: 0ebe2ea2e1
Similar to ABORTED=USER_SKIP (active commandline filter), archives
having ABORTED=ARCHIVE_EXCLUDE_SKIP (active archive_exclude
configuration) do not cause exit status 10 and are hidden from
transaction log.
While $vol->{URL} can contain "//" if volume="/" (intentionally, this
is an assembled path), the filter statements are sanitized using
check_url(). This means we need to match the filter statement against
check_url($vol->{URL}). Same applies to subvol.
Btrfs does not destroy qgroups when subvolumes are deleted (see
https://bugzilla.kernel.org/show_bug.cgi?id=91751). As a workaround
for this, btrbk can be configured to always destroy the corresponding
default qgroup "0/<subvol-id>" whenever a subvolume (snapshot, backup
or archive) is deleted.
Added configuration options:
- snapshot_qgroup_destroy
- target_qgroup_destroy
- archive_qgroup_destroy
When doing a batch delete (multiple deletes with one call to "btrfs
subvolume delete"), we want to know which subvolumes have failed. For
this, we need parse the error output.
On any parsing failure, we assume that nothing has been deleted, and
warn accordingly (forward compatibility).
Example:
Manually create a key:
# KEYFILE=/some/secure/place/btrbk.key
# dd if=/dev/urandom bs=1 count=32 | od -x -A n | tr -d "[:space:]" > $KEYFILE
btrbk.conf:
volume /mnt/btr_pool
incremental no
raw_target_encrypt openssl_enc
openssl_ciphername aes-256-cbc
openssl_iv_size 16 # NOTE: set to "no" if no IV is needed by the selected cipher
openssl_keyfile /some/secure/place/btrbk.key
subvolume home
target raw ssh://cloud.example.com/backup
While taint mode [1] is a nice feature of perl, e.g. it disallows
using variables (such as filenames from the config file) which were
not validated in system() commands, it also treats $PATH as insecure
(which inherently is, as perl cannot know who messed around with it).
[1] perlsec(1): http://perldoc.perl.org/perlsec.html
[2] perlrun(1): http://perldoc.perl.org/perlrun.html
Note that btrbk still does all taint checks, and can be run in taint
mode:
- by executing `perl -T /usr/sbin/btrbk`,
- or by changing the hashbang to: `!#/usr/bin/perl -T`.
Despite FATAL warnings are discouraged in perl and may break forward
compatibility [1], we still use it as btrbk is usually run as root and
we really want perl to die on programmatic errors.
[1] "perldoc warnings"
The "duration" column in the transaction log has proven to be
confusing to some users, especially on errors (e.g. "send-receive
ERROR 27" in issue #177). As it's not really necessary (duration can
be computed from the corresponding "starting" log entry), it's now
being dropped.
As of btrfs-progs-v4.12, the "btrfs subvolume show" command does not
print the full (absolute, resolved) path anymore [1]. Instead, it prints
the relative path to btrfs root (or "/" if it is the root).
The impact for btrbk is that we cannot fill our realpath_cache in
btrfs_subvolume_show() anymore. This is not fatal, but has the
following consequences:
- The "check for duplicate snapshot locations" may now miss
subvolumes specified by symlinks.
- If multiple "volume" sections point to the same subvolume (e.g. if
specified using symlinks) an additional "btrfs subvolume list" is
called. Note that the subvolume will still be recognized as
identical, and the btr_tree will not be rebuilt.
[1] btrfs-progs commit: b7df24aa5cddc4802b9938f56372b73869775cd9
This gets important when using an old backup disk as source.
In terms of btrfs send/receive, all subvolumes matching "uuid /
received_uuid" are valid backups.
Merged (amend) from pull request: #116
Verified by Axel Burri <axel@tty0.ch>
We set "--no-random-seed-file" because one of the btrbk
design principles is to not create any files unasked. Enabling
"--no-random-seed-file" creates ~/.gnupg/random_seed, and as
such depends on $HOME to be set correctly (think on running in
cron). From gpg2(1) man page:
--no-random-seed-file GnuPG uses a file to store its
internal random pool over invocations This makes random
generation faster; however sometimes write operations are not
desired. This option can be used to achieve that with the cost
of slower random generation.
We use "dd" instead of shell redirections, as it is common to have
special filesystems (like NFS, SMB, FUSE) mounted on the raw target
path. By using "dd" we make sure to write in reasonably large blocks
(default=128K), which is not always the case when using redirections
(e.g. "gpg > outfile" writes in 8K blocks).
Another approach would be to always pipe through "cat", which uses
st_blksize from fstat(2) (with a minimum of 128K) to determine the
block size.
- add sophisticated stream compression in run_cmd
- add special "compress" cmd_pipe item
- add special "redirect" cmd_pipe item:
use shell redirection instead of troublesome "dd of=".
- disable ssh_compression if stream_compression is set
We already perform compression before gpg, such that compressing in gpg
is just a waste of time. Interestingly, it seems gpg is not trying to
recompress gzip[ed] input streams, as for the default gzip compression
this patch does not change performance. However, it is necessary for
the upcoming lz4 compression to show its real benefit.
Add configuration option transaction_syslog, which can be set to a short
name of a syslog facility, like user or local5. Most of the ones besides
localX do not really make sense, but whatever, let the user decide.
The only logging that is relevant for logging to syslog is the logging
generated inside sub action, so it's easy to hijack all messages in
there and also send them to syslog if needed.
All output is done via print_formatted, which expects a file handle.
So, abuse a file handle to a string to be able to change as less code as
needed for this feature.
Since syslog already adds the timestamps for us, I added a syslog
formatting pattern, which is very similar to tlog, omitting the
timestap.