Zrep documentation

This document covers a general overview of zrep, as well as giving specific examples for use cases.

Contents

Overview of zrep

Zrep is designed to be a robust, yet simple, mechanism to keep a pair of zfs filesystems in sync, in a highly efficient manner. It can be used on as many filesystems on a server as you like. It relies on ssh trust between hosts.

There are two general areas where you might consider zrep:

  1. High availability style failover of filesystems between two servers
  2. A centralized backup server
Since the original design spec of zrep was for the failover case, most of the documentation is geared towards that. However, there is a section at the end geared towards the secure backup server case.

Please note that, while the "backup server" requires only one-way trust, the other usage examples below presume that you have two-way ssh trust, and OS+filesystem targets similar to the following:
host1 - zfs pool "pool1", with ZFS version equivalent to solaris 10 update 9+
	root ssh trust to/from host2

host2 - zfs pool "pool2", with ZFS version equivalent to solaris 10 update 9+
	root ssh trust to/from host1
host1 and host2 are able to "ping" and "ssh" to/from each other

Reminder, that to allow root ssh trust, you may have to

If you have a ZFS that supports delegation of privileges, you may be able to run zrep as non-root, if you use "zfs allow" for the following:
  create,destroy,hold,mount,receive,rename,rollback,send,snapshot,userprop

zrep snapshot naming conventions

zrep by default makes snapshots that look like
  your_fs@zrep_123abc
Note that the number part, is a HEXADECIMAL serial number, not a random string of numbers and letters.

Also note, that if you override the "tag" used (mentioned later in this document), that the initial "zrep@" part will change to match whatever tag you set.


Initialization of zrep replicated filesystem "prodfs"

	host1# zrep -i pool1/prodfs host2 destpool
This will create an initial snapshot on prodfs. It will then create a new filesystem, destpool/prodfs, on host2, and set it "readonly" there.

Special tips:

Pre-existing filesystem

If for some reason you have a pre-existing (not-zrep-initialized) snapshotted filesystem replication pair that you want to convert to using zrep, you can do so, by first renaming the most recent snapshot to match zrep snapshot naming conventions. (See Overview section of this document)
If you havent already done replication, you can save yourself a bit of work, by creating an initial snapshot that already machines zrep snapshot naming conventions, before doing the zfs send and receive.

Next, set the basic zfs properties zrep expects such as zrep:src-host, etc.
You can do this with

srchost# "zrep changeconfig -f srcfs desthost destfs"
desthost# "zrep changeconfig -f -d destfs  srchost srcfs"

You will then need to set the last-sent property on the master snapshot, which you can do easily via
srchost# zrep sentsync fs@zrep_snapnamehere"
You should then be ready to do a "zrep sync fs"

Initialization for nested ZFS filesystems (Recursive flag)

If you wish to set up replication for prodfs, and all ZFS filesystems under it, then you can use the new environment variable as follows:
    export ZREP_R=-R
You need to have this set for the zrep init, and also for all subsequent zrep syncs.
(well, technically, you COULD only set the -R flag for every hour, yet sync just the top one without it more often. But ....ick)

I strongly suggest you do not mix and match nesting. Take an all-or-nothing approach, of picking one parent ZFS filesystem, and then relying on that single anchor to replicate everything under it, consistently. Dont try to have one zrep init for the top level, but then use zrep init again on a sub-filesystem.
zrep wont stop you from doing this, but I suspect you may run into problems down the line.

Why you might NOT want to do this:
This will cause your data transfers to be serialized. you would probably get better throughput if you could overlap them. Then again... I havent gotten around to rewriting the zrep global locking to cleanly allow for multiple runs. Soon(tm).

Additionally, zrep at present uses "zfs get -r". If you have a hundred nested filesystems, zrep status type operations will then take a lot longer than otherwise.
If you have a THOUSAND of them... I would imagine it would take a WHOLE lot longer!

I imagine wall-clock time would still only be affected by less than 60 seconds extra though, so... use at your own risk.

Replication

Here is how you tell zrep to replicate updates from the master to the other side.
	host1# zrep -S pool1/prodfs
You can call this manually, or from a cron job as frequently as once a minute. It will know from initialization, where to replicate the filesystem to, and do so.

If you have more than one filesystem initialized as a zrep master, you may also use

	# zrep -S all
(Note that at the current time, this runs as a single non-threaded process, so it may be faster for you to explicitly run separate zrep processes)

You can safely set up a cronjob on both host1 and host2 to do "all", and it will "do the right thing", for the most part. However, to avoid seeing potential harmless errors for conflicts on overly long syncs, you can set a "quiet limit" for syncs.

	
	# zrep sync -q NUMBER-OF-SECONDS all
Then, if it has been less than NUMBER-OF-SECONDS since last succesful sync for a filesystem, it will benignly continue to the next filesystem, with a small note on stdout, even if it cant get a lock on a particular zrep registered filesystem to do a new sync.

If you have nested ZFS filesystems and are using zrep to sync them all in a single job, see the section about initialization of nested filesystems, to make sure that you set the required environment variable before doing your "zrep sync"

Failover

	host1# zrep failover pool1/prodfs
In planned failover situations, you probably want to run something from your production side, rather than the standby side, to trigger failover. This is what you want to run

This will reconfigure each side to know that the flow of data should now be host2 -> host1, and flip readonly bits appropriately. Running "zrep -S all" on host1 will then ignore pool1/prodfs Running "zrep -S all" on host2 will then sync pool2/prodfs to pool1/prodfs

If, in contrast, you have already completed an emergency "takeover" from the other side, you can officially acknowlege the remote side as master, with

	host1# zrep failover -L  pool1/prodfs

Takeover

	host2# zrep takeover pool2/prodfs
This is basically the same as "planned failover" example, but the required syntax for running on the standby host.

For EMERGENCY failover purposes, where the primary host is down, you should instead force takeover by this host, with

	host2# zrep takeover -L pool2/prodfs

Status

You can find the status of zrep managed filesystems in multiple ways. To simply find a list of them, use
	hostX# zrep list
This will give you a list of all zrep related filesystems. If you want to see all the internal zrep related zfs properties, add the -v flag.

To see the status of all zrep "master" filesystems, use

	hostX# zrep status
This will give a list of zrep managed filesystems the host is "master" for, and the date of the last successfully replicated snapshot.
You can see the status of all zrep filesystems, use -a.

If you have a mix of master, and slave, filesystems,you may wish to use the -v flag, which will show both the source and destination, as well as the last synced time. Please note that the order of flags does matter. So,

	hostX# zrep status -v -a

Tuning

SSH option tuning

If you would like to provide custom options to the ssh command zrep invokes... or replace it with an entirely different program or wrapper.. you may set the environment variable $SSH, to whatever you wish.

Archive management

higher number of archives on remote side

By default, the number of zrep-recognized snapshots will be the same on both sides.
This is controlled by the zrep:savecount variable. You may set different values for each side, by using "zrep set" on the filesystem

Long-lived archives

You CANNOT manually create snapshots on remote side, because then incremental repliation will fail, due to the target "not being the most recent snapshot"

Because of thise issues, the best way to keep long-lived archives, may be one of the following methods:

a) create non-zrep-recognized snapshots on local side. Delete locally, but not remotely zfs send -I will copy over even non-recognized "new" snapshots.

b) schedule 'clone' jobs on remote side. pick the most recent zrep snapshot, and create a filesystem clone. This will not dupliacate the files on disk, and also not interfere with ongoing zrep snapshot/replication activity Again, care must be taken not to leave the clones around indefinitely, to the point where they fill available space.

Multiple Destinations

The default usage for zrep is to replicate to a single filesystem. However, some people may wish for multiple destinations. To do this requires essentially running multiple zrep processes per filesystem.

CAUTION ! At present, you must ensure, yourself, that you do not overlap multiple zrep processes, if you are using ZREPTAG functionality. Run them sequentially, not in parallel.
I am planning an update to fix this issue, but at the moment, there are problems with zrep global lock contention that may arise otherwise.

Set the environment variable ZREPTAG, to a short, unique identifier for each destination ( I strongly recommend you stick to the format "zrep-somethinghere"). Then init and sync as normal. For example:

	export ZREPTAG=zrep-dc
	zrep init pool/fs dc.your.com remotepool/fs
	zrep sync pool/fs
	  ...
	export ZREPTAG=zrep-ca
	zrep init pool/fs cd.your.com remotepool/fs
	zrep sync pool/fs

Alternatively, you may use "zrep -t tagnamehere normal.zrep.command.here"

Debugging

This section is for when Things Go Horribly Wrong... or if you just want to understand zrep better.

The first rule is, when debugging set "DEBUG=1" in your environment, to make zrep be a little more verbose.

Almost everything zrep does is controlled by ZFS properties.
ZFS allows you to set random properties of your choice, on filesystems.
So for example,

zfs set I_am=the_greatest /some/filesystem
is perfectly valid usage.

By default, zrep prefixes all its values with "zrep:". You can see the values it sets on a filesystem, (while skipping the many standard system-level properties) with "zrep list -v".
Sample output:

$ zrep list -v rpool/PLAY/source
rpool/PLAY/source:
zrep:master     yes
zrep:src-fs     rpool/PLAY/source
zrep:dest-fs    rpool/PLAY/d1
zrep:src-host   (myhostname)
zrep:savecount  5
zrep:dest-host  localhost

See also, the Troubleshooting page



Use from a backup server (the reverse from normal)

Some folks may be looking just for a way to simplify their backup mechanisms, rather than doing fancy failover.
In this scenario, you probably will prefer one centralized backup server that has trusted root ssh OUT, but does not allow root ssh IN. (This way, it is possible to have independant clients, in separate zones of trust or system administration from each other).

For this goal, both initial setup, and replication, is reversed from the normal way. Normally, zrep expects the data master to have full ssh privilege into the data replication target. However, when a backup server is the target, sysadmins usually prefer the target system to have the privileges instead of the client.

To implement a backup server type layout with zrep, the simplest way is when you are starting from scratch, with a brand new filesystem.

However, it is also possible to retroactively convert an existing production ZFS filesystem, to be backed up by zrep to a backup server.

Backup server with a clean new filesystem

The steps would look like this, as run on the backup server:

  1. Set up the filesystem initially on the backup server
  2. Do "zrep init {localfs} {clienthost} {clientfs}"
  3. "zrep failover {localfs}" #to make the client the data master.
At this point you have now set up initial state cleanly, and failed over so that the "client" side is the active master(ie: the read/write one) for the pair. The backup server side will be in read-only mode, ready to receive syncs.
If you already have a client zfs filesystem with a bunch of data, but you dont want to give it full access on the backup server side, it is technically possible to do a manual initialization to the backup server. A smart sysadmin should be able to do a manual zfs replication, and then setting the zfs properties themselves.
Alternatively, using the new "zrep changeconfig -f" syntax, and finally, converting an existing snapshot to a zrep active one as detailed in the Troubleshooting page
To then trigger incremental backups from the backup server side, you have to use a special zrep command: "zrep refresh"

You must then set up a special zrep job on the backup server, instead of the usual procedure of pushing it from the master side. It will look just like a "zrep sync" job, but instead, you will want to call

	zrep refresh pool/fs
Instead of the master side "pushing" new data, you will now be pulling the latest bits to your read-only backup server copy of the filesystem.

Backup server with an existing ZFS filesystem

This presumes you have ssh trust from backup server to the client. It also presumes you have installed zrep on both sides, somewhere in the standard $PATH

Hostnames are "client1" and "backupsrv".
Data filesystem is "data/fs".
Backup pool base is "backup/client1"

FYI, these instructions basically walk you through what a "zrep init" does. So if some time in the future the init process changes, these docs will need updating as well.

  • Create a snapshot of client fs
  • client1# zfs snap data/fs@zrep_000001
  • set zrep base properties
  • client1# zrep changeconfig -f data/fs backupsrv backup/client1/fs
  • Replicate to backup server and set properties
  • backupsrv# ssh client1 zfs send data/fs@zrep_000001 | zfs recv -F
    backupsrv# zrep changeconfig -f -d backup/client1/fs client1 data/fs
  • let zrep know valid sync has happened
  • # this will also set "master" flag on client1
    # -L flag only in github at present...
    client1# zrep sentsync -L data/fs@zrep_000001

    It should now be possible to run at your preferred frequency

        backupsrv# zrep refresh backup/client1/fs
      

    Disaster recovery of client from backup server mode

    Note: This section is specific to running zrep as part of a "backup server" type situation. For more normal zrep usage, see higher up in this page.

    If you get into the situation where you have lost a client filesystem (or worse yet, the entire remote server), the good news is that it should be relatively straightforward to reinitialize, using the following steps:

    1. Get the filesystem on the backup server, to be exactly how you need it to be.
    2. If there is some reason you wish to preserve some point-in-time snapsnots, make your own by hand that dont start with "zrep"
    3. Clear all zfs snapshots and information, by using "zrep clear pool/fs". This should just remove metadata, but leave the filesystem intact.
    4. Do a full resync to the remote client, with
      "zrep init pool/fs client remotepool/fs" on the backup server.
    You are effectively now back at step 2 of the initial backup server setup. Finish that setup proceedure.
    That should be all that is required to get you up and running again!

    Author's note: I am pleased to know that people have been running my zrep script since 2012, in production, across the world! It is a great motivation to know that people are finding my efforts useful to them.

    That being said, another great motivation is things like an amazon gift card, or something from my wishlist. :-D No gift too small, it's the thought that counts ;-)
    My amazon wishlist