This effectively reverts 0e63843195 and
173319e7e1.
asciidoc has been revived (for a while now) and doesn't require Python 2. We
still prefer asciidoctor and fallback to asciidoc/a2x if it's not available.
Comparing the asciidoc and asciidoctor man pages, everything looks OK.
Python tends to be available more readily in distribution build environments
rather than the Ruby stack. Also, the pregenerated man pages are gone as of
f132c94c65.
Signed-off-by: Sam James <sam@gentoo.org>
If deletion is skipped, we don't have a schedule call on the target,
which is used for --print-schedule text. Add some (rather hacky) code
to be able to also use the schedule result of the backup process.
Note that "no target action" for archive is replaced by "<no_action>",
for consistency with action run:
[-] /path/to/target/snapshot_basename.*
is now displayed as:
<no_action>
A more sophisticated implementation would be to check this after
scheduling, only if the target really needs to be backuped.
We could as well automatically trigger a `btrfs snapshot -r` on target
in these cases, but this seems counter-intuitive.
Sanitize file (or subvolume path) arguments in safe_cmd, effectively
removing leading double slash.
Files originating from "volume /" can be assembled as "//some/subvol",
which is useful internally but undesired as command arguments, as
ancient systems might interpret leading double slash "//" in a special
way.
Posix states:
> A pathname that begins with two successive slashes may be
> interpreted in an implementation-defined manner, although more than
> two leading slashes shall be treated as a single slash.
• In principle the special `IFS`-variable could be set to some unexpected non-
standard value.
Unsetting it causes its default to be used.
• Locales and in particular their characters sets are quite complex in POSIX and
may have many subtle implications.
For example, the pattern matching notation (used in `case`-compound-commands
or some forms of parameter expansion) are in principle only defined for
character strings. While some shells handle it gracefully, the behaviour is
undefined if, for example, the character set is UTF-8 and a variable contains
bytes that do not form valid caracters in that.
Actually, there are quite some more implications.
Also, pathnames, in POSIX, are strings of bytes excluding 0x0.
For these reasons, the locale is set to the `C`/`POSIX`-locale.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
• Set shell options in one command.
• Homogeneously use local variables for function positional parameters in all
places.
• In redirections, omit `1` for standard output.
• Homogeneously use `if`-compount-commands instead of `[ … ] && …` in all
places.
• Homogeneously use curly brackets with parameter expansion.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
OpenSSH’s environment variable `SSH_CLIENT` has been deprecated in upstream
commit f37e246f858cdd79be4f4e158b7b04778d1cb7e9 (2002-09-19) and replaced by
`SSH_CONNECTION`.
Both contain more than just the remote information, thus adapted the log message
to reflect that.
Since this might be used by 3rd-party programs (like fail2ban), added a specific
note to the changelog.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
In spirit, POSIX considers `echo` rather obsolete (it was just kept because of
its widespread use).
It’s also not possible to use `echo` portably unless it’s `-n`-option (as the
first argument) and escape sequences are omitted.
While neither was the case here, it’s better style to just always use `printf`
in order to avoid any future confusion when both are used.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
This commit finishes the work from the previous one and converts
ssh_filter_btrbk.sh to (mostly) pure POSIX Shell Command Language.
Instead of bash’s `=~`-operator for its `[[ … ]]`-compound-command it uses
`grep`.
At the time of writing, bash has at least the `nocasematch`-shell-option which
would have a negatve security impact for this program. While it’s not enabled
per default single users could potentially change that, not realising the
consequences.
Thus, moving away from this may also provide some hardening.
Unlike bash’s `=~`-operator, which matches against the whole string at once,
`grep` matches the pattern against each line of input.
This would allow for attacks by including a newline in the SSH command like in:
SSH_ORIGINAL_COMMAND="readlink /dev/stdout
cat /etc/shadow"
but is prevented by the general exclusion of newlines in commit TODO.
`grep` may return an exit status of `0` when used with its `-q`-option, even
when an error occurred.
Since this program is intended specifically for security purposes this shall be
avoided, even if such case is unlikely, and therefore its standard output and
standard error are redirected to `/dev/null` instead.
Further, using just:
local formatted_restrict_path_list="$(printf '%s' "$restrict_path_list" | sed 's/|/", "/g')"
rather than:
local formatted_restrict_path_list=""; formatted_restrict_path_list="$(printf '%s' "$restrict_path_list" | sed 's/|/", "/g')"
prevent `set -e` to take effect if the pipeline within the command substitution
fails, as the returned exit status of the whole command is the result of
`local`, not that of the assignment.
This is however no security problem here, as `formatted_restrict_path_list` is
only used for informative pruposes.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
ssh_filter_btrbk.sh is mainly intended for security purposes and should
therefore itself be written with that in mind.
It is written for bash, which greatly extends the POSIX Shell Command Language
and is incompatible with it in some niche cases.
For several reasons, it seems a good idea to convert the program to (mostly)
pure POSIX Shell Command Language:
• People may try to use the program with other shells (for example when bash is
not available) and make errors.
More pure `sh` implementations like dash …
• … have far less code and also less dependencies, which possibly also reduces
the chance for bugs or exploits,
• … are less dynamic in development (and have thus possibly a lower chance of
incompatible changes) …
• … and may run faster.
This commit replaces any unnecessary “bashishms” with purely POSIX compatible
code, with the exception of the `local`-built-in, which is however supported by
most POSIX compatible shells (including dash, klibc-utils’s `sh` and BusyBox’
`sh`) in some way.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Double quote any variable expansions that might ever contain field separators.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
In strings that don’t contain `'` nor do any expansions, use single quotes to
avoid any future unintended expansions or escapes.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
This disallows newline (that is: LF characters) in the SSH command, which could
have been exploited for arbitrary code execution, since commit
77a39282de.
Example:
# export SSH_ORIGINAL_COMMAND=$'readlink /dev/stdout\ncat /etc/shadow'
# ssh_filter_btrbk.sh
Since `readlink` is a generally allowed command, this works with any of
ssh_filter_btrbk.sh’s options.
But most likely, other commands that are “added” via `allow_cmd()` can be used,
too.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date: Wed Nov 30 04:29:53 2022 +0100
#
# On branch fix-remote-code-execution
# Your branch and 'origin/fix-remote-code-execution' have diverged,
# and have 1 and 1 different commits each, respectively.
# (use "git pull" to merge the remote branch into yours)
#
# Changes to be committed:
# modified: ssh_filter_btrbk.sh
#
# Untracked files:
# ORIG
#
While most functionality works fine, raw backups fail to write correct
"FILE=" information in info sidecar.
Disallowing newlines in files is a good idea in general.
This adds support for bzip3 [1].
[1] https://github.com/kspalaiologos/bzip3
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Cosmetics: swap order pbzip2 / bzip3
Signed-off-by: Axel Burri <axel@tty0.ch>
`mydomain.com` is actually a real domain and shouln’t be used in examples.
RFC 2606 (respectively RFC 6761) reserves `example.org` (and others) for that
purpose.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Since `btrbk` executes only commands, it shouldn’t need any of what’s currently
disabled with the `restrict` flag in the `authorized_keys` file, that is:
Port-, agent- and X11-forwarding as well as PTY allocation and execution of
`~/.ssh/rc`.
Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
A redirection (e.g. `echo foo > bar.info`) can cause empty (zero-size)
files in some circumstances.
We still write INCOMPLETE=1 to the info file before send/receive, but
instead of re-creating it without the INCOMPLETE flag, we append
INCOMPLETE=0 (keeping up compatibility with old versions of btrbk).
Ref: 4e5ae975d8 btrbk: ignore zero-size info files
When backuping from devices that have configured to use raw backup and
that might disconnect from the network (ie. laptops) you end up once in
a while with 0 size info file (and backup file).
btrbk don't know how to handle 0 file and stop backing up until the zero
size file is removed.
With this change 0 size info file will be ignored, and hence the backup
for the given backup will be redone.
Signed-off-by: Matthieu Patou <mat@matws.net>
Warning for btrfs_commit_delete is always printed, regardless of the
(possibly valid) values.
regression in btrbk-0.32.3
687e0508b7 btrbk: tidy deprecation warnings
It is perfectly ok to run btrbk without ssh_identity (using ssh
defaults), printing a warning if the option is not set is wrong.
Instead, hackily check for ssh_identity on ssh errors, and give a hint
in the error message.
Deleting multiple subvolumes at once always caused the problem that we
need to parse stderr of "rm" and "btrfs subvolume delete" in order to
know which subvolume actually failed, which is problematic (version
dependent, language dependent). Also, we would need to restrict the
number of subvolumes based on the maximum allowed length for shell
commands, which is system-dependent (check `getconf ARG_MAX`).
Deleting subvolumes sequentially has slightly negative impact on
execution time (multiple rsh commands), with the benefit of being more
robust and reducing the codesize.
Currently, option arguments are only completed after =. For example:
$ btrbk --loglevel=<TAB>
debug error info trace warn
$ btrbk --loglevel <TAB>
archive diff extents ls prune run
stats clean dryrun list origin resume
snapshot usage
This commit makes it so that both option styles are recognized:
$ btrbk --loglevel=<TAB>
debug error info trace warn
$ btrbk --loglevel <TAB>
debug error info trace warn
This was the intention all along, but it was implemented incorrectly.
On btrbk archive, after creating a directory without dry-run, the
archive target is skipped with "Failed to fetch subvolume detail" due
to caching of realpath.
Regression in btrbk-0.32.0:
eb69bc883e btrbk: refactor mountinfo
Now that timestamp_format defaults to "long", it seems no longer
needed to even mention it here. Setting to "short" or "long-iso" is
only required in rather special use cases.
Set node as tree root if mount_id == parent_id. For some reasons this
is never the case on my (mostly gentoo) systems.
Regression from: eb69bc88 btrbk: refactor mountinfo
This is a relict of early days of btrbk, and I have already hesitated
for too long to change the default from legacy "short" to sane "long"
format.
Tests show that the scheduling behaves in a sane/expected way if this
change is applied unattended. I suppose everybody who has
preserve_hour_of_day set is already using timestamp_format=long.
For the paranoid. For convenience, filename checking was removed in
[1], and quoting was (hopefully) implemented correctly in [2].
Allowing special characters as well as UTF8 leave behind a bad
feeling, as there are many special cases that needs to be taken care
of (e.g. newlines in file names, right-to-left encoding, etc.). In
order to mitigate attacks expoiting these error classes, leave an
option to power users which do only allow "sane" characters in their
filename hierarchy.
[1] 6a29b08f00 btrbk: remove filename restrictions
[2] acc7f9fc83 btrbk: quote unsafe characters in shell commands
New defaults gives the btrbk_direct_leaf snapshots higher preference
than the global ones resolved by parent-uuid (which are best-guess).
This way the parent has a higher chance of being a backup created by
btrbk, which results in "btrfs receive" to start work on a snapshot of
this (and preferably not on the "best-guess" ones).
- Create tree from /proc/self/mountinfo, and use it to find mount
points.
- Populate realpath cache from mount points, possibly reducing calls
to `realpath`.
- Replace btrfs_mountpoint with vinfo_mountpoint(fs_type => 'btrfs)
- Tidy action "ls".
- Move code
Use Text::CharWidth::mbswidth() if installed, fallback to
length(Encode::decode_utf8()), fallback to length().
- Text::CharWidth handles wide chars (e.g. asian, taking up two
columns on the terminal) correctly.
- length(Encode::decode_utf8()) handles single-width chars only, and
should be installed on most systems (perl >= v5.7.3).
- langth() counts bytes, as we do not convert anything to UTF-8 in
btrbk (NOT using `perl -CIOEioA` or binmode(STDOUT, ":utf8"))
As filenames can contain meta characters like '$', we can't just put
ssh commands in double quotes. E.g. the following would trigger
variable expansion on local shell:
ssh example.com "ls -l 'evil$x'"
Instead, we quote the ssh using single quotes (adding the need to
escape single quotes), while also quoting unsafe filenames using
single quotes. The above becomes:
ssh example.com 'ls -l '\''evil$x'\'''
Or more complex, for a file named "file with'single quotes'":
ssh example.com 'ls -l '\''file with'\''\'\'''\''single quotes'\''\'\'''\'''\'''
On the remote shell, this will expand to:
ls -l 'file with'\''single quotes'\'''
We don't want to promote weird naming conventions here.
The idea behind the change in the previous commit [1] was to get rid
of confused people asking me: "hey, I don't have a subvolume for my
rootfs, all I see is @ and @home".
Left an example for "subvolume @" which should be clear to Ubuntu
people out there.
[1] de6c7ab586 btrbk.conf.example: use @subvol notation
Making sure this is done after splitting, as encoded value could be a
comma.
After some testing it shows that the kernel [1] produces ambigous
output in "super options" if a subvolume containing a comma is mounted
using "-o subvolid=" (tried hard to mount with "-o subvol=", seems not
possible via shell):
# btrfs sub create /tmp/btrbk_unittest/mnt_source/svol\,comma
# mount /dev/loop0 -o subvolid=282 '/tmp/btrbk_unittest/mount,comma'
# cat /proc/self/mountinfo
[...]
48 40 0:319 /svol,comma /tmp/btrbk_unittest/mount,comma rw,relatime - btrfs /dev/loop0 rw,ssd,noacl,space_cache,subvolid=282,subvol=/svol,comma
^^^^^^^^^^^^^^^^^^
[1] sys-kernel/gentoo-sources-5.10.45
Security vulnerability fixed in alternation regex. Specialy crafted
commands may be executed without being propely checked.
Affects all versions >= btrbk-v0.23.0
Regression from:
ccb5ed5e71 ssh_filter_btrbk: allow "realpath" and "cat /proc/self/mounts" on targets
Reported by: @protree (responsible disclosure)
Until now the main README.md started with a pretty complex example, making
the learning curve unnecessary steep for new users. Start instead with the
simplier example with the local snapshots of 'home'. It was even simplified
a bit more to serve as good introduction, and step-by-step instructions were
added.
Add configurable prefix for each line of command output. Seems wrong,
but outsmarts the mail clients.
The problem is that some (most?) mail clients outsmart the specs and
replace text/plain mails by quotations, emoticons, emphasis, ...
The only "correct" solution is to disable these features in the mail
client.
Acceptable workaround for #376.
Terminology for "backup" is specified in btrbk(1), use it:
Backup is a btrbk terminology for a "read-only subvolume created
with send/receive" (showing a received-uuid).
- If rsync is enabled, show number of created/deleted/transferred
files in mail subject.
- Add options to show summary and/or detail message in mail body.
- Add option to skip btrbk if no files were transferred via rsync.
- Add option to call sync(1) prior to running btrbk.
- Add option to skip btrbk execution if no files were transferred.
Add compat, compat_local, compat_remote configuration options.
Used for busybox: instead of running `readlink -e` (which is not
available on busybox), run `readlink -f` followed by `test -d`.
For consistency with PATH and URL, $vol->{PRINT} was not sanitized
(trailing "//" if parent is "/"). This is now dropped, as this is more
confusing than useful.
btrbk requires "btrfs subvolume list|show" queries from the mount
point in order to build btrfs trees. This conflicts with tightly set
--restrict-path.
As we print @stderr in warnings if vinfo_init_root() function fails,
we need to make sure that @stderr contains sane values.
Clearing @stderr is required when calling caching functions!
Explicitely printing ssh errors (or even warnings) is
inconsistent. stderr is printed outside the run_cmd framework if
needed.
reverts a80e05984a btrbk: catch ssh errors
See ssh(1):
ssh exits with the exit status of the remote command or with 255
if an error occurred.
Note that this includes network errors as well as dns failures
Dont check url_regex if $vinfo->{URL} is not a valid path (e.g. when
checking against a "volume" or "subvolume" having wildcards):
"Use of uninitialized value in join or string" at line 3030
Using `WantedBy=multi-user.target` makes boot wait for btrbk.service
before it's considered "finished". This can be checked by running
`systemd-analyze` or checking the system log using `journalctl`.
Timers should use the "timers.target" target, see systemd.special(7).
Print table output column headings single-line uppercase instead of
lowercase and underlined.
This is common ascii table format, is easy parseable and offers better
readability e.g. in pager.
The backup summary does not print "<no_action>" if a subvolume is
skipped by --exclude or noauto. Alas, skipped volumes results in
no_action still being printed (see #291). Adding an additional
IS_ABORTED($sroot, "skip_") check fixes this issue.
Allows patterns like "svol.2019*".
Needs @exclude_vf to be a global variable, which is ok as this only
holds command-line args (and global context for archive_exclude).
Note that adding the same in "create backups" (action run) and
macro_delete() is not a good idea, as this has weird implications on
the "forced: last" snapshot/backup pair.
Note that any action (e.g. btrfs send -p) on a subvolume with unsafe
characters will still fail with:
ERROR: Unsafe command [...] (offending string: "/tmp/btrfs_pool/svol with spaces")
Features:
- get mounted filesystems from /proc/self/mountinfo
- fetch subvolumes using "btrfs subvolume list" (fast, needs root)
- filter and print subvolumes below mount point
Add run_cmd option stream_buffer_sink, which handles stream_buffer,
rate_limit as well as --progress.
For rate limiting, run "mbuffer" (on the target host) in combination
with stream_buffer and --progress, instead of running "pv" (on the
source host).
Reasons:
- mbuffer limits the read rate: For remote targets, we want a stream
buffer in front of the rsh command pipe, before decompression.
- For local targets, this can be combined with --process.
- Combined stream_buffer and rate_limit: less commands in pipe.
Further changes:
- always set mbuffer -v1 option (never show warnings)
- restrict raw_target_block_size to "kmgKMG": compatibility to
stream_compress and rate_limit options, simplicity.
- use mbuffer blocksize option where applicable
Use $resolve_droot instead of $droot for calls to get_best_correlated
(probably missed commit), same as $resolve_sroot.
Fixes possible regression of:
514e69243a btrbk: add "incremental_resolve" configuration option
For raw targets, get_best_parent() dies as VINFO_MOUNTPOINT is not
defined on raw vinfo.
Fixes regression of:
d64e237e94 btrbk: get_best_parent: consider all parent/child relations
Package maintainers like to build everything from scratch, removing
overhead. Again, we apologize for the inconvenience.
Note that reproducible builds are still guaranteed by setting
SOURCE_DATE_EPOCH in doc/Makefile.
Reverts: a6dbd60e5a documentation: add pre-generated man pages: from groff to asciidoc and back again
It's not uncommon to have a large intact parent-chain on targets
(e.g. target_preserve_min=all).
If this is the case, performance drops a bit on "btrbk archive".
Note that we could limit the search depth in get_best_parent() for
some performance improvements, as this only affects extra clones.
Perl hates recursions, and dies if recursion depth = 100:
Deep recursion on subroutine "main::_push_related_children"
Unfortunately this happens before the implemented abort condition
(distance=256).
Fixed by re-implementing get_related_readonly_nodes() non-recursive.
Refs: https://github.com/digint/btrbk/issues/279
This reduces build-time dependencies to zero, helping package
maintainers and providing reproducible builds.
NOTE: generated man pages will only be updated on releases. In order
to make sure the docs are correctly rebuilt, run "make clean man". We
apologize for the inconvenience.
From Groff to Asciidoc and Back Again [1]
=========================================
| Comparison | Links |
| -------------------------------------------------------------------------| -------------------------------------------------------------------------------------------------------------- |
| **Plain ROFF** | |
| +++ Best result for `man` (our main goal!) | |
| - Not supported by github | [btrbk(1) v0.25.1 in plain groff](https://github.com/digint/btrbk/blob/v0.25.1/doc/btrbk.1) |
| - No decent converters: e.g. `groff -Txhtml -mandoc` | [btrbk(1) v0.25.1 at digint.ch (official site)](https://digint.ch/btrbk/doc/archive/btrbk-0.25.1/btrbk.1.html) |
| | |
| **asciidoc** | |
| + Decent (scriptable!) html | [btrbk(1) v0.26.1 at digint.ch (official site)](https://digint.ch/btrbk/doc/archive/btrbk-0.26.1/btrbk.1.html) |
| + Supported by github (helps contributors writing decent documentation) | [btrbk.1.asciidoc v0.26.1 at github](https://github.com/digint/btrbk/blob/master/doc/btrbk.1.asciidoc) |
| | |
| **asciidoc (`xmlto`, `atx`)** | http://asciidoc.org/ |
| + Good result for `man` | |
| - EOL (why care that much? it works fine!) | [asciidoc EOL notice at github](https://github.com/asciidoc/asciidoc/releases/tag/8.6.10) |
| | |
| **asciidoc (`asciidoctor`)** | https://asciidoctor.org |
| - Pulls in tons of ruby (build depends) | https://bugs.gentoo.org/681056 |
| -- Implies to commit a pre-converted `doc/btrbk.1` | |
| + Seems more evolved than `xmlto`, still actively maintained | |
| | |
| **rst (`xrst2man.py` from docutils)** | |
| - Worse result for `man` | |
| ++ Good html converters (after having a quick look at it) | |
| - Not so well supported by github | [btrbk.1.rst v0.26.1 at github](https://github.com/digint/btrbk/blob/rst2man/doc/btrbk.1.rst) |
[1] https://github.com/digint/btrbk/pull/219 (edited)
Remove support for man page generation using asciidoc "a2x": The
project is discontinued, and depends on Python 2.7.
As we will provide pre-generated man pages as of btrbk-0.28.0, this is
not needed any more.
With this, previous snapshots (far relations) are still listed when
restoring a snapshot.
Example (S = source subvolume, readwrite):
After 3 snapshots:
A->S, B->S, C->S
Restore B: `btrfs subvol delete S; btrfs subvol snapshot B S'`
A->S, B->S, C->S, S'->B
Previous implementation would show now snapshots for S', as no
snapshot has parent_uuid=S'.
New implementation shows A, B, C as snapshots for S', as orphaned
siblings (A, B, C pointing to deleted S) are also related.
Makes sure that if, for whatever reason, a subvolume having correct
btrbk name scheme does NOT share any extents with previous snapshots
is never used as parent.
Note that if a related parent is found, the unrelated closest
older/newer (by btrbk timestamp) subvolumes are still added as clone
sources.
Preferences for parent (and required clone sources):
1. closest older in snapdir (by btrbk timestamp), related
2. closest older related (by cgen)
3. closest newer related (by cgen)
4. closest older in snapdir (by btrbk timestamp)
5. closest newer in snapdir (by btrbk timestamp)
Note: prefering 1 over 2 helps keeping parent-chain within droot on
target (assuming that btrfs always uses correlated parent on
btrfs-receive).
This will e.g. add a clone source on "btrbk resume", if both older AND
newer snapshot/backup pairs exists.
Also makes sure that the closest older btrbk snapshot is always added
as clone source, even if another related subvolume has newer cgen.
Old implementation was missing last readonly parent in chain, as well
as orphaned siblings.
Also sort all by cgen, not by distance, then cgen.
Also skip self.
Allowed values for "incremental_resolve":
- "mountpoint" (default): Use parents in the filesystem tree below
mount points of source `<volume-directory>/<snapshot-dir>` and
target `<target-directory>`.
- "directory": Use parents strictly below source/target
directories. Useful when restricting access, e.g. when using
ssh_filter_btrbk.sh.
- "_all_accessible" (experimental): Use parents from all mount points.
Note that using "_all_accessible" causes btrfs-progs to fail:
- btrfs send -p: "ERROR: not on mount point: /path/to/mountpoint"
- btrfs receive: "ERROR: parent subvol is not reachable from inside the root subvol"
see also: https://github.com/kdave/btrfs-progs/issues/96
Build check hash within btr_tree node instead per URL. This makes it
aware of shared btr_tree (different hostname:port pointing to same
btrfs filesystem).
Common virtual machine setups have multiple volume sections with same
host, but distinct port numbers for each machine.
- make caches dependent on MACHINE_ID instead of HOST
- append port number to URL
- add MACHINE_ID to vinfo
- use MACHINE_ID where applicable
This even works if virtual machines share the same btrfs filesystems:
If a equal UUID is found on distinct machines, btr_tree() will return
the already present tree, in order to be consistent after node
injections.
Setting the ssh port directly in the "volume" / "target" config lines
adds the possibility to have a create a unique "hostname:port"
identifier (preparatory for MACHINE_ID to distinguish virtual machines
on same host with different ports.)
btrbk <= 0.27.2 does not print "target_rsh" and "target_type" when
called with --format=raw, see $table_formats{resolved}. This is fixed
in 0.28.0.
Hardcoding target_type=send-receive is not so bad, as for raw targets
btrbk-verify complains first with:
btrbk-verify: missing required variable "target_rsh" in btrbk --format=raw line
So we should not run rsync (which is not really a problem, rsync just
fails with "not a directory").
Compare files and attributes by checksum, using rsync(1) in dry-run
mode with all preserve options enabled.
Resolves snapshot/backup pairs by evaluating the output of
"btrbk list latest [filter...]".
Restrictions:
- ".d..t...... ./" lines are ignored by default:
Root folder timestamp always differ.
- "cd+++++++++ .*" lines are ignored by default:
Nested subvolumes appear as new empty directories.
- btrbk raw targets are skipped
- rsync needs root in most cases (see --ssh-* options)
When called from another script, we dont want the help message printed
on errors. E.g. when running something like:
btrbk list snapshots -q filter_which_does_not_match
While on traditional UNIX the documentation (especially the man pages)
are gzip'ed, modern distros have helpers to compress it.
This patch adds an option to disable compression:
make COMPRESS=no
When configuring "target" in a global (or "volume") context, and
overriding target_preserve_min in "subvolume" section, the scheduler
has undefined behavior (mixing up the "min" values).
Fixed by returning a copy of the preserve hash in
config_preserve_hash().
It is possible that the subvolume path is not accessible by the user
calling btrbk. When resolving mount points, "readlink" is used on the
path, which also needs to be wrapped with "sudo".
The FORCE_PRESERVE information is set on the node, and was lost for
"latest common target" as get_receive_targets() returned vinfo without
node information.
fixes regression: 6c502cb btrbk: search complete target tree for correlated subvolumes
btrbk now runs "btrfs subvolume list" from the mountpoint instead of
the volume path, which for some users is not below --restrict-path. As
the output of "btrfs subvolume list" is the same (complete btrfs tree
for the filesystem), it is ok to ignore the restrict-path here.
Since we consider all accessible subvolumes in get_related_subvolumes,
checking for equal BTRBK_BASENAME and empty SUBVOL_DIR does not work
when checking for same btrbk file name scheme.
fixes regression: b37ef84e36 (btrbk: always read mountpoints; include all snapshots from mountpoint as candidates for best common parent)
Improve error handling in btrfs_send_receive: on error, always try to
read the target subvolume and only delete it automatically if it is
garbled (read/write, no received_uuid).
This is especially important if the target subvolume was already
present before send/receive.
Reverts: 4c4afe77 btrbk: skip target metadata test if send/receive has errors
If btrfs_subvolume_show($vol, rootid => 5) fails, there are no
"received_uuid" and no "gen" keys in the root node.
Fixes: 0acbf74c57 (btrbk: add btrfs_subvolume_list_complete: fetch all subvolumes with all flags)
As we allow <url> to be specified as "<hostname>:<directory>", an URL
"ssh://my.host" (without trailing slash) was parsed as hostname="ssh",
directory="/my.host".
Wrapper, returns complete list of all subvolumes (including btrfs
root, id=5) with all flags. Requires three calls to btrfs-progs.
Adaptions and cleanup in btr_tree().
Btrfs root subvolume (id=5) have no UUID and cannot be backed
up. Abort if "subvolume ." is configured on btrfs root, e.g.:
volume /path/to/btrfs_root
subvolume .
Note that the UUID for btrfs root (id=5) is not always present:
- btrfs-progs < 4.12 does not support rootid lookup
- UUID can be missing if filesystem was created with btrfs-progs < 4.16
Still we need to always read it, as the whole tree is cached and we
don't know if it will be used.
Filesystems created with btrfs-progs < 4.16 have valid UUID, while
others have not [1]. Validate output of "btrfs subvolume show", and
provide uuid for btrfs root (id=5) only if it is valid.
[1]: 0a0a03554a: btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of, FS_TREE
When using asciidoctor, backend "manpage" (-b manpage) is used, while
a2x converts asciidoc to docbook (xml), then manpage.
Asciidoctor creates ugly indentation for [literal] blocks in SYNOPSIS,
use [verse] instead.
Many people use whitespace even in mountpoints, silently ignore
(loglevel=info) non-parseable btrfs mountpoints.
Btrbk does not support file names with whitespace or special
characters by design, and specifying such mountpoints in the
configuration file fails anyway.
Covers commits:
1c83a65 btrbk: add filter capabilities to vinfo_subvol_list
a25487e btrbk: cosmetics (log messages)
ef5c369 btrbk: use _is_same_fs_tree() where applicable
0454f60 btrbk: bugfix: match btrbk_basename in get_latest_snapshot_child()
2c1c3b4 btrbk: cleanup: remove snapshot_dir, rename sroot->snaproot
c457540 btrbk: use separate vinfo for snapshot directory (allows snapshot_dir to be a mountpoint)
f5dc4e0 btrbk: add known mountpoints to btr_tree nodes as anchor for reverse lookup
e9374b3 btrbk: replace url_cache by spec_cache
0ea0430 btrbk: cleanup (cosmetics, documentation)
b37ef84 btrbk: always read mountpoints; include all snapshots from mountpoint as candidates for best common parent
b549e11 btrbk: raw targets: move tree readin to separate function; add caching
7a1bc25 btrbk: raw targets: create fake btr_tree instead of maintaining separate list
6c502cb btrbk: search complete target tree for correlated subvolumes
Instead of passing snapshot_dir all over the place, use a separate
vinfo for the snapshot directory, accessible by vinfo_snapshot_root().
As it is initialized separately by vinfo_init_root(), it can be on a
different mountpoint.
This also allows us to use different semantics for snapshot_dir in the
future, as it does not need to be relative to the volume directory.
Dropped readin of subvolid and realpath by btrfs_subvolume_show(), we
now always read /proc/self/mounts (and call readlink).
When picking the best common parent in get_best_parent(), we want to
list as many snapshots as possible. For now, we list all from the
mountpoint of snaproot ($sroot/<snapshot_dir>), due to a bug in
btrfs-progs [1]. Also added code (commented out) to list snapshots
from all known mountpoints.
[1] https://github.com/kdave/btrfs-progs/issues/96
- move matching for correlated subvolumes from get_receive_targets
into new function _receive_target_nodes
- add lookup tables in btr_tree (RECEIVED_UUID_HASH, UUID_HASH),
allowing for faster matching in _receive_target_nodes
- add vinfo_resolved() for mapping nodes to vinfo
- rename get_latest_common to get_best_parent (while moving some
functionality to new function get_related)
- cleanup
In the scheduler, a month (or year) does not start at the first day,
but at the first `preserve_day_of_week`. Make sure that all days
before `preserve_day_of_week` in a month get delta_months+1.
Example (corner case):
- `preserve_day_of_week sunday`
- `target_preserve *m`
- no backups in 2018-02
- backup with timestamp 2018-03-01 (which is a thursday)
- backup with timestamp 2018-03-04 (which is a sunday)
Without this patch, because there are no sunday backups in 2018-02,
the first backup is considered a weekly (+4d after sunday), and as
such "first weekly of month 2018-03", and the second one is discarded.
With this patch, the first item is considered "first weekly of month
2018-02", and the second gets "first weekly of month 2018-03".
NOTE: This change may result in (previously preserved) backups to be
deleted!
Snapshots and backups having no exact time information (created with
"timestamp_format=short") are set to 00:00, which would be regarded as
"previous day" if preserve_hour_of_day is greater than 0. Fix this by
ignoring preserve_hour_of_day in this case.
Introduces the new config option "preserve_hour_of_day" to specify
after what time backups should be considered as dailies.
Based on pull request #204, with changes:
- calculation of weekly backups
- change format of preserve_matrix
Suppress "Option redefined" warning for snapshot_name config option,
which has hardcoded (computed) default already set when checking.
fix regression: 0ebe2ea2e1
Similar to ABORTED=USER_SKIP (active commandline filter), archives
having ABORTED=ARCHIVE_EXCLUDE_SKIP (active archive_exclude
configuration) do not cause exit status 10 and are hidden from
transaction log.
While $vol->{URL} can contain "//" if volume="/" (intentionally, this
is an assembled path), the filter statements are sanitized using
check_url(). This means we need to match the filter statement against
check_url($vol->{URL}). Same applies to subvol.
Btrfs does not destroy qgroups when subvolumes are deleted (see
https://bugzilla.kernel.org/show_bug.cgi?id=91751). As a workaround
for this, btrbk can be configured to always destroy the corresponding
default qgroup "0/<subvol-id>" whenever a subvolume (snapshot, backup
or archive) is deleted.
Added configuration options:
- snapshot_qgroup_destroy
- target_qgroup_destroy
- archive_qgroup_destroy
When doing a batch delete (multiple deletes with one call to "btrfs
subvolume delete"), we want to know which subvolumes have failed. For
this, we need parse the error output.
On any parsing failure, we assume that nothing has been deleted, and
warn accordingly (forward compatibility).
Example:
Manually create a key:
# KEYFILE=/some/secure/place/btrbk.key
# dd if=/dev/urandom bs=1 count=32 | od -x -A n | tr -d "[:space:]" > $KEYFILE
btrbk.conf:
volume /mnt/btr_pool
incremental no
raw_target_encrypt openssl_enc
openssl_ciphername aes-256-cbc
openssl_iv_size 16 # NOTE: set to "no" if no IV is needed by the selected cipher
openssl_keyfile /some/secure/place/btrbk.key
subvolume home
target raw ssh://cloud.example.com/backup
While taint mode [1] is a nice feature of perl, e.g. it disallows
using variables (such as filenames from the config file) which were
not validated in system() commands, it also treats $PATH as insecure
(which inherently is, as perl cannot know who messed around with it).
[1] perlsec(1): http://perldoc.perl.org/perlsec.html
[2] perlrun(1): http://perldoc.perl.org/perlrun.html
Note that btrbk still does all taint checks, and can be run in taint
mode:
- by executing `perl -T /usr/sbin/btrbk`,
- or by changing the hashbang to: `!#/usr/bin/perl -T`.
Despite FATAL warnings are discouraged in perl and may break forward
compatibility [1], we still use it as btrbk is usually run as root and
we really want perl to die on programmatic errors.
[1] "perldoc warnings"
The "duration" column in the transaction log has proven to be
confusing to some users, especially on errors (e.g. "send-receive
ERROR 27" in issue #177). As it's not really necessary (duration can
be computed from the corresponding "starting" log entry), it's now
being dropped.
As of btrfs-progs-v4.12, the "btrfs subvolume show" command does not
print the full (absolute, resolved) path anymore [1]. Instead, it prints
the relative path to btrfs root (or "/" if it is the root).
The impact for btrbk is that we cannot fill our realpath_cache in
btrfs_subvolume_show() anymore. This is not fatal, but has the
following consequences:
- The "check for duplicate snapshot locations" may now miss
subvolumes specified by symlinks.
- If multiple "volume" sections point to the same subvolume (e.g. if
specified using symlinks) an additional "btrfs subvolume list" is
called. Note that the subvolume will still be recognized as
identical, and the btr_tree will not be rebuilt.
[1] btrfs-progs commit: b7df24aa5cddc4802b9938f56372b73869775cd9
Under "Example: laptop with usb-disk for backups" the readme stated that " snapshot_preserve 14d" will "keep daily snapshots for 14 days [..]". I believe that this is misleading, as it seems to imply that only one snapshot --the latest -- will be kept in that period, when in fact _all_ snapshots will be kept in that period.
This gets important when using an old backup disk as source.
In terms of btrfs send/receive, all subvolumes matching "uuid /
received_uuid" are valid backups.
Merged (amend) from pull request: #116
Verified by Axel Burri <axel@tty0.ch>
We set "--no-random-seed-file" because one of the btrbk
design principles is to not create any files unasked. Enabling
"--no-random-seed-file" creates ~/.gnupg/random_seed, and as
such depends on $HOME to be set correctly (think on running in
cron). From gpg2(1) man page:
--no-random-seed-file GnuPG uses a file to store its
internal random pool over invocations This makes random
generation faster; however sometimes write operations are not
desired. This option can be used to achieve that with the cost
of slower random generation.
Always overwrite destination .gz files during make install.
Otherwise you need to manually answer y to several prompts.
```gzip: /usr/share/doc/btrbk/README.md.gz already exists; do you wish to overwrite (y or n)? y```
We use "dd" instead of shell redirections, as it is common to have
special filesystems (like NFS, SMB, FUSE) mounted on the raw target
path. By using "dd" we make sure to write in reasonably large blocks
(default=128K), which is not always the case when using redirections
(e.g. "gpg > outfile" writes in 8K blocks).
Another approach would be to always pipe through "cat", which uses
st_blksize from fstat(2) (with a minimum of 128K) to determine the
block size.
- add sophisticated stream compression in run_cmd
- add special "compress" cmd_pipe item
- add special "redirect" cmd_pipe item:
use shell redirection instead of troublesome "dd of=".
- disable ssh_compression if stream_compression is set
bugfix for: 796b6bd9bf
Replace realpath with readlink in allowed commands. Commit 796b6bd substituted readlink for realpath in file "btrbk"; this commit propagates the change to ssh_filter_btrbk.sh.
When used without --inplace, rsync creates a new copy of the file and
moves it into place when it is complete, having the effect that btrfs
creates a new extent for the WHOLE file. With --inplace however, rsync
writes the updated data directly to the destination file, having the
effect that btrfs creates a new extent only for the differing part of
the file.
We already perform compression before gpg, such that compressing in gpg
is just a waste of time. Interestingly, it seems gpg is not trying to
recompress gzip[ed] input streams, as for the default gzip compression
this patch does not change performance. However, it is necessary for
the upcoming lz4 compression to show its real benefit.
Add configuration option transaction_syslog, which can be set to a short
name of a syslog facility, like user or local5. Most of the ones besides
localX do not really make sense, but whatever, let the user decide.
The only logging that is relevant for logging to syslog is the logging
generated inside sub action, so it's easy to hijack all messages in
there and also send them to syslog if needed.
All output is done via print_formatted, which expects a file handle.
So, abuse a file handle to a string to be able to change as less code as
needed for this feature.
Since syslog already adds the timestamps for us, I added a syslog
formatting pattern, which is very similar to tlog, omitting the
timestap.
- remove %subvol_list_cache: may slow down aa bit, but makes possible
to inject nodes correctly
- simplify subtree list (is now an array as it should have been from
the beginning); correctly fill tree_cache
- fix vinfo_set_detail; cleanup
- %btrfs_tree_cache (replaces %root_tree_cache)
- %subvol_list_cache (replaces %vinfo_cache):
- vinfo_init_root() (was: vinfo_root()) now lookups in cache before
calling btrfs_subvolume_detail()
- vinfo_subvol_list() now lookups in cache before calling
btrfs_subvolume_list()