cdfsync

NAME
SYNOPSIS
DESCRIPTION
GENERAL
SETUP
USAGE
CONNECTING TO A CDFSYNC SERVER
CONNECTING TO A CDFSYNC SERVER OVER A REMOTE SHELL PROGRAM
RUNNING A CDFSYNC SERVER
RUNNING A CDFSYNC SERVER OVER A REMOTE SHELL PROGRAM
EXAMPLE
OPTIONS SUMMARY
OPTIONS
EXCLUDE PATTERNS
BATCH MODE
SYMBOLIC LINKS
DIAGNOSTICS
EXIT VALUES
ENVIRONMENT VARIABLES
FILES
SEE ALSO
DIAGNOSTICS
BUGS
CREDITS
THANKS
AUTHOR

NAME

cdfsync − netCDF synchronization tool based on rsync

SYNOPSIS

cdfsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST

cdfsync [OPTION]... [USER@]HOST:SRC DEST

cdfsync [OPTION]... SRC [SRC]... DEST

cdfsync [OPTION]... [USER@]HOST::SRC [DEST]

cdfsync [OPTION]... SRC [SRC]... [USER@]HOST::DEST

cdfsync [OPTION]... cdfsync://[USER@]HOST[:PORT]/SRC [DEST]

cdfsync [OPTION]... SRC [SRC]... cdfsync://[USER@]HOST[:PORT]/DEST

DESCRIPTION

cdfsync allows users to synchronize their local netCDF files with those on a remote machine. It is based on rsync and includes the same options as that program, as well as a couple of new options for netCDF transfers. The protocol, however, differs from that of rsync and is optimized for transfer of netCDF files.

rsync is a program that behaves in much the same way that rcp does, but has many more options and uses the rsync remote-update protocol to greatly speed up file transfers when the destination file already exists.

The cdfsync remote-update protocol allows cdfsync to transfer just the differences between two sets of files across the network connection, using an efficient checksum-search algorithm described in the technical report at http://samba.anu.edu.au/rsync .

Some of the additional features of cdfsync/rsync when compared to rcp are:

o

support for copying links, devices, owners, groups and permissions

o

exclude and exclude-from options similar to GNU tar

o

a CVS exclude mode for ignoring the same files that CVS would ignore

o

can use any transparent remote shell, including ssh or rsh

o

does not require root privileges

o

pipelining of file transfers to minimize latency costs

o

support for anonymous or authenticated cdfsync servers (ideal for mirroring)

GENERAL

There are eight different ways of using cdfsync. They are:

o

for copying local files. This is invoked when neither source nor destination path contains a : separator

o

for copying from the local machine to a remote machine using a remote shell program as the transport (such as ssh or rsh). This is invoked when the destination path contains a single : separator.

o

for copying from a remote machine to the local machine using a remote shell program. This is invoked when the source contains a : separator.

o

for copying from a remote cdfsync server to the local machine. This is invoked when the source path contains a :: separator or a cdfsync:// URL.

o

for copying from the local machine to a remote cdfsync server. This is invoked when the destination path contains a :: separator or a cdfsync:// URL.

o

for copying from a remote machine using a remote shell program as the transport, using cdfsync server on the remote machine. This is invoked when the source path contains a :: separator and the --rsh=COMMAND (aka "-e COMMAND") option is also provided.

o

for copying from the local machine to a remote machine using a remote shell program as the transport, using cdfsync server on the remote machine. This is invoked when the destination path contains a :: separator and the --rsh=COMMAND option is also provided.

o

for listing files on a remote machine. This is done the same way as cdfsync transfers except that you leave off the local destination.

Note that in all cases (other than listing) at least one of the source and destination paths must be local.

SETUP

See the file README for installation instructions.

Once installed, you can use cdfsync to any machine that you can access via a remote shell (as well as some that you can access using the cdfsync daemon-mode protocol). For remote transfers, a modern cdfsync uses ssh for its communications, but it may have been configured to use a different remote shell by default, such as rsh or remsh.

You can also specify any remote shell you like, either by using the -e command line option, or by setting the CDFSYNC_RSH environment variable.

One common substitute is to use ssh, which offers a high degree of security.

Note that cdfsync must be installed on both the source and destination machines.

USAGE

You use cdfsync in the same way you use rcp. You must specify a source and a destination, one of which may be remote.

Perhaps the best way to explain the syntax is with some examples:

cdfsync *.c foo:src/

This would transfer all files matching the pattern *.c from the current directory to the directory src on the machine foo. If any of the files already exist on the remote system then the cdfsync remote-update protocol is used to update the file by sending only the differences. See the tech report for details.

cdfsync -avz foo:src/bar /data/tmp

This would recursively transfer all files from the directory src/bar on the machine foo into the /data/tmp/bar directory on the local machine. The files are transferred in "archive" mode, which ensures that symbolic links, devices, attributes, permissions, ownerships, etc. are preserved in the transfer. Additionally, compression will be used to reduce the size of data portions of the transfer.

cdfsync -avz foo:src/bar/ /data/tmp

A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning "copy the contents of this directory" as opposed to "copy the directory by name", but in both cases the attributes of the containing directory are transferred to the containing directory on the destination. In other words, each of the following commands copies the files in the same way, including their setting of the attributes of /dest/foo:

cdfsync -avz /src/foo /dest

cdfsync -avz /src/foo/ /dest/foo

You can also use cdfsync in local-only mode, where both the source and destination don’t have a ’:’ in the name. In this case it behaves like an improved copy command.

cdfsync somehost.mydomain.com::

This would list all the anonymous cdfsync modules available on the host somehost.mydomain.com. (See the following section for more details.)

CONNECTING TO A CDFSYNC SERVER

It is also possible to use cdfsync without a remote shell as the transport. In this case you will connect to a remote cdfsync server running on TCP port 874.

You may establish the connection via a web proxy by setting the environment variable CDFSYNC_PROXY to a hostname:port pair pointing to your web proxy. Note that your web proxy’s configuration must support proxy connections to port 874.

Using cdfsync in this way is the same as using it with a remote shell except that:

o

you use a double colon :: instead of a single colon to separate the hostname from the path or a cdfsync:// URL.

o

the remote server may print a message of the day when you connect.

o

if you specify no path name on the remote server then the list of accessible paths on the server will be shown.

o

if you specify no local destination then a listing of the specified files on the remote server is provided.

Some paths on the remote server may require authentication. If so then you will receive a password prompt when you connect. You can avoid the password prompt by setting the environment variable CDFSYNC_PASSWORD to the password you want to use or using the --password-file option. This may be useful when scripting cdfsync.

WARNING: On some systems environment variables are visible to all users. On those systems using --password-file is recommended.

CONNECTING TO A CDFSYNC SERVER OVER A REMOTE SHELL PROGRAM

It is sometimes useful to be able to set up file transfers using cdfsync server capabilities on the remote machine, while still using ssh or rsh for transport. This is especially useful when you want to connect to a remote machine via ssh (for encryption or to get through a firewall), but you still want to have access to the cdfsync server features (see RUNNING A CDFSYNC SERVER OVER A REMOTE SHELL PROGRAM, below).

From the user’s perspective, using cdfsync in this way is the same as using it to connect to a cdfsync server, except that you must explicitly set the remote shell program on the command line with --rsh=COMMAND. (Setting CDFSYNC_RSH in the environment will not turn on this functionality.)

In order to distinguish between the remote-shell user and the cdfsync server user, you can use ’-l user’ on your remote-shell command:

cdfsync -av --rsh="ssh -l ssh-user" cdfsync-user@host::module[/path] local-path

The "ssh-user" will be used at the ssh level; the "cdfsync-user" will be used to check against the cdfsyncd.conf on the remote host.

RUNNING A CDFSYNC SERVER

a cdfsync server is configured using a configuration file. Please see the cdfsyncd.conf(5) man page for more information. By default the configuration file is called /etc/cdfsyncd.conf, unless cdfsync is running over a remote shell program and is not running as root; in that case, the default name is cdfsyncd.conf in the current directory on the remote computer (typically $HOME).

RUNNING A CDFSYNC SERVER OVER A REMOTE SHELL PROGRAM

See the cdfsyncd.conf(5) man page for full information on the cdfsync server configuration file.

Several configuration options will not be available unless the remote user is root (e.g. chroot, setuid/setgid, etc.). There is no need to configure inetd or the services map to include the cdfsync server port if you run a cdfsync server only via a remote shell program.

To run a cdfsync server out of a single-use ssh key, see this section in the cdfsyncd.conf(5) man page.

EXAMPLE

To mirror a netCDF archive from a remote machine using ssh while compressing the netCDF files and only transferring netCDF files:

cdfsync -arz --netcdf-only www.example.com:/home/data/stuff /home/data/mystuff

OPTIONS SUMMARY

Here is a short summary of the options available in cdfsync. Please refer to the detailed description below for a complete description.

 -v, --verbose               increase verbosity
 -q, --quiet                 decrease verbosity
 -c, --checksum              always checksum
 -a, --archive               archive mode, equivalent to -rlptgoD
 -r, --recursive             recurse into directories
 -R, --relative              use relative path names
     --no-relative           turn off --relative
     --no-implied-dirs       don’t send implied dirs with -R
 -b, --backup                make backups (see --suffix & --backup-dir)
     --backup-dir            make backups into this directory
     --suffix=SUFFIX         backup suffix (default ~ w/o --backup-dir)
 -u, --update                update only (don’t overwrite newer files)
 -l, --links                 copy symlinks as symlinks
 -L, --copy-links            copy the referent of all symlinks
     --copy-unsafe-links     copy the referent of "unsafe" symlinks
     --safe-links            ignore "unsafe" symlinks
 -H, --hard-links            preserve hard links
 -p, --perms                 preserve permissions
 -o, --owner                 preserve owner (root only)
 -g, --group                 preserve group
 -D, --devices               preserve devices (root only)
 -t, --times                 preserve times
 -S, --sparse                handle sparse files efficiently
 -n, --dry-run               show what would have been transferred
 -W, --whole-file            copy whole files, no incremental checks
     --no-whole-file         turn off --whole-file
 -x, --one-file-system       don’t cross filesystem boundaries
 -B, --block-size=SIZE       checksum blocking size (default 700)
 -e, --rsh=COMMAND           specify the remote shell
     --cdfsync-path=PATH       specify path to cdfsync on the remote machine
     --existing              only update files that already exist
     --ignore-existing       ignore files that already exist on receiver
     --delete                delete files that don’t exist on sender
     --delete-excluded       also delete excluded files on receiver
     --delete-after          receiver deletes after transfer, not before
     --ignore-errors         delete even if there are I/O errors
     --max-delete=NUM        don’t delete more than NUM files
     --partial               keep partially transferred files
     --force                 force deletion of dirs even if not empty
     --numeric-ids           don’t map uid/gid values by user/group name
     --timeout=TIME          set I/O timeout in seconds
 -I, --ignore-times          turn off mod time & file size quick check
     --size-only             ignore mod time for quick check (use size)
     --modify-window=NUM     compare mod times with reduced accuracy
 -T  --temp-dir=DIR          create temporary files in directory DIR
     --compare-dest=DIR      also compare received files relative to DIR
     --link-dest=DIR         create hardlinks to DIR for unchanged files
 -P                          equivalent to --partial --progress
 -z, --compress              compress file data
 -C, --cvs-exclude           auto ignore files in the same way CVS does
     --exclude=PATTERN       exclude files matching PATTERN
     --exclude-from=FILE     exclude patterns listed in FILE
     --include=PATTERN       don’t exclude files matching PATTERN
     --include-from=FILE     don’t exclude patterns listed in FILE
     --files-from=FILE       read FILE for list of source-file names
 -0  --from0                 all file lists are delimited by nulls
     --version               print version number
     --daemon                run as a cdfsync daemon
     --no-detach             do not detach from the parent
     --address=ADDRESS       bind to the specified address
     --config=FILE           specify alternate cdfsyncd.conf file
     --port=PORT             specify alternate cdfsyncd port number
     --blocking-io           use blocking I/O for the remote shell
     --no-blocking-io        turn off --blocking-io
     --stats                 give some file transfer stats
     --progress              show progress during transfer
     --log-format=FORMAT     log file transfers using specified format
     --password-file=FILE    get password from FILE
     --bwlimit=KBPS          limit I/O bandwidth, KBytes per second
     --write-batch=PREFIX    write batch fileset starting with PREFIX
     --read-batch=PREFIX     read batch fileset starting with PREFIX
     --netcdf-only           only transfer netcdf files
     --debug-protocol        write protocol output to tmp file for debug
     --in-place              update files in place (experimental)
 -h, --help                  show this help screen

OPTIONS

cdfsync uses the GNU long options package. Many of the command line options have two variants, one short and one long. These are shown below, separated by commas. Some options only have a long variant. The ’=’ for options that take a parameter is optional; whitespace can be used instead.

-h, --help

Print a short help page describing the options available in cdfsync

--version

print the cdfsync version number and exit

-v, --verbose

This option increases the amount of information you are given during the transfer. By default, cdfsync works silently. A single -v will give you information about what files are being transferred and a brief summary at the end. Two -v flags will give you information on what files are being skipped and slightly more information at the end. More than two -v flags should only be used if you are debugging cdfsync.

-q, --quiet

This option decreases the amount of information you are given during the transfer, notably suppressing information messages from the remote server. This flag is useful when invoking cdfsync from cron.

-I, --ignore-times

Normally cdfsync will skip any files that are already the same size and have the same modification time-stamp. This option turns off this "quick check" behavior.

--size-only

Normally cdfsync will not transfer any files that are already the same size and have the same modification time-stamp. With the --size-only option, files will not be transferred if they have the same size, regardless of timestamp. This is useful when starting to use cdfsync after using another mirroring system which may not preserve timestamps exactly.

--modify-window

When comparing two timestamps cdfsync treats the timestamps as being equal if they are within the value of modify_window. This is normally zero, but you may find it useful to set this to a larger value in some situations. In particular, when transferring to Windows FAT filesystems which cannot represent times with a 1 second resolution --modify-window=1 is useful.

-c, --checksum

This forces the sender to checksum all files using a 128-bit MD4 checksum before transfer. The checksum is then explicitly checked on the receiver and any files of the same name which already exist and have the same checksum and size on the receiver are not transferred. This option can be quite slow.

-a, --archive

This is equivalent to -rlptgoD. It is a quick way of saying you want recursion and want to preserve almost everything.

Note however that -a does not preserve hardlinks, because finding multiply-linked files is expensive. You must separately specify -H.

-r, --recursive

This tells cdfsync to copy directories recursively. If you don’t specify this then cdfsync won’t copy directories at all.

-R, --relative

Use relative paths. This means that the full path names specified on the command line are sent to the server rather than just the last parts of the filenames. This is particularly useful when you want to send several different directories at the same time. For example, if you used the command

cdfsync foo/bar/foo.c remote:/tmp/

then this would create a file called foo.c in /tmp/ on the remote machine. If instead you used

cdfsync -R foo/bar/foo.c remote:/tmp/

then a file called /tmp/foo/bar/foo.c would be created on the remote machine -- the full path name is preserved.

--no-relative

Turn off the --relative option. This is only needed if you want to use --files-from without its implied --relative file processing.

--no-implied-dirs

When combined with the --relative option, the implied directories in each path are not explicitly duplicated as part of the transfer. This makes the transfer more optimal and also allows the two sides to have non-matching symlinks in the implied part of the path. For instance, if you transfer the file "/path/foo/file" with -R, the default is for cdfsync to ensure that "/path" and "/path/foo" on the destination exactly match the directories/symlinks of the source. Using the --no-implied-dirs option would omit both of these implied dirs, which means that if "/path" was a real directory on one machine and a symlink of the other machine, cdfsync would not try to change this.

-b, --backup

With this option, preexisting destination files are renamed as each file is transferred or deleted. You can control where the backup file goes and what (if any) suffix gets appended using the --backup-dir and --suffix options.

--backup-dir=DIR

In combination with the --backup option, this tells cdfsync to store all backups in the specified directory. This is very useful for incremental backups. You can additionally specify a backup suffix using the --suffix option (otherwise the files backed up in the specified directory will keep their original filenames). If DIR is a relative path, it is relative to the destination directory (which changes in a recursive transfer).

--suffix=SUFFIX

This option allows you to override the default backup suffix used with the --backup (-b) option. The default suffix is a ~ if no --backup-dir was specified, otherwise it is an empty string.

-u, --update

This forces cdfsync to skip any files for which the destination file already exists and has a date later than the source file.

In the currently implementation, a difference of file format is always considered to be important enough for an update, no matter what date is on the objects. In other words, if the source has a directory or a symlink where the destination has a file, the transfer would occur regardless of the timestamps. This might change in the future (feel free to comment on this on the mailing list if you have an opinion).

-l, --links

When symlinks are encountered, recreate the symlink on the destination.

-L, --copy-links

When symlinks are encountered, the file that they point to (the referent) is copied, rather than the symlink.

--copy-unsafe-links

This tells cdfsync to copy the referent of symbolic links that point outside the copied tree. Absolute symlinks are also treated like ordinary files, and so are any symlinks in the source path itself when --relative is used.

--safe-links

This tells cdfsync to ignore any symbolic links which point outside the copied tree. All absolute symlinks are also ignored. Using this option in conjunction with --relative may give unexpected results.

-H, --hard-links

This tells cdfsync to recreate hard links on the remote system to be the same as the local system. Without this option hard links are treated like regular files.

Note that cdfsync can only detect hard links if both parts of the link are in the list of files being sent.

This option can be quite slow, so only use it if you need it.

-W, --whole-file

With this option the incremental cdfsync algorithm is not used and the whole file is sent as-is instead. The transfer may be faster if this option is used when the bandwidth between the source and target machines is higher than the bandwidth to disk (especially when the "disk" is actually a networked filesystem). This is the default when both the source and target are on the local machine.

--no-whole-file

Turn off --whole-file, for use when it is the default.

-p, --perms

This option causes cdfsync to set the destination permissions to be the same as the source permissions.

Without this option, each new file gets its permissions set based on the source file’s permissions and the umask at the receiving end, while all other files (including updated files) retain their existing permissions (which is the same behavior as other file-copy utilities, such as cp).

-o, --owner

This option causes cdfsync to set the owner of the destination file to be the same as the source file. On most systems, only the super-user can set file ownership. By default, the preservation is done by name, but may fall back to using the ID number in some circumstances. See the --numeric-ids option for a full discussion.

-g, --group

This option causes cdfsync to set the group of the destination file to be the same as the source file. If the receiving program is not running as the super-user, only groups that the receiver is a member of will be preserved. By default, the preservation is done by name, but may fall back to using the ID number in some circumstances. See the --numeric-ids option for a full discussion.

-D, --devices

This option causes cdfsync to transfer character and block device information to the remote system to recreate these devices. This option is only available to the super-user.

-t, --times

This tells cdfsync to transfer modification times along with the files and update them on the remote system. Note that if this option is not used, the optimization that excludes files that have not been modified cannot be effective; in other words, a missing -t or -a will cause the next transfer to behave as if it used -I, and all files will have their checksums compared and show up in log messages even if they haven’t changed.

-n, --dry-run

This tells cdfsync to not do any file transfers, instead it will just report the actions it would have taken.

-S, --sparse

Try to handle sparse files efficiently so they take up less space on the destination.

NOTE: Don’t use this option when the destination is a Solaris "tmpfs" filesystem. It doesn’t seem to handle seeks over null regions correctly and ends up corrupting the files.

-x, --one-file-system

This tells cdfsync not to cross filesystem boundaries when recursing. This is useful for transferring the contents of only one filesystem.

--existing

This tells cdfsync not to create any new files - only update files that already exist on the destination.

--ignore-existing

This tells cdfsync not to update files that already exist on the destination.

--max-delete=NUM

This tells cdfsync not to delete more than NUM files or directories. This is useful when mirroring very large trees to prevent disasters.

--delete

This tells cdfsync to delete any files on the receiving side that aren’t on the sending side. Files that are excluded from transfer are excluded from being deleted unless you use --delete-excluded.

This option has no effect if directory recursion is not selected.

This option can be dangerous if used incorrectly! It is a very good idea to run first using the dry run option (-n) to see what files would be deleted to make sure important files aren’t listed.

If the sending side detects any I/O errors then the deletion of any files at the destination will be automatically disabled. This is to prevent temporary filesystem failures (such as NFS errors) on the sending side causing a massive deletion of files on the destination. You can override this with the --ignore-errors option.

--delete-excluded

In addition to deleting the files on the receiving side that are not on the sending side, this tells cdfsync to also delete any files on the receiving side that are excluded (see --exclude). Implies --delete.

--delete-after

By default cdfsync does file deletions on the receiving side before transferring files to try to ensure that there is sufficient space on the receiving filesystem. If you want to delete after transferring, use the --delete-after switch. Implies --delete.

--ignore-errors

Tells --delete to go ahead and delete files even when there are I/O errors.

--force

This options tells cdfsync to delete directories even if they are not empty when they are to be replaced by non-directories. This is only relevant without --delete because deletions are now done depth-first. Requires the --recursive option (which is implied by -a) to have any effect.

-B , --block-size=BLOCKSIZE

This controls the block size used in the cdfsync algorithm. See the technical report for details.

-e, --rsh=COMMAND

This option allows you to choose an alternative remote shell program to use for communication between the local and remote copies of cdfsync. Typically, cdfsync is configured to use ssh by default, but you may prefer to use rsh on a local network.

If this option is used with [user@]host::module/path, then the remote shell COMMAND will be used to run a cdfsync server on the remote host, and all data will be transmitted through that remote shell connection, rather than through a direct socket connection to a running cdfsync server on the remote host. See the section "CONNECTING TO A CDFSYNC SERVER OVER A REMOTE SHELL PROGRAM" above.

Command-line arguments are permitted in COMMAND provided that COMMAND is presented to cdfsync as a single argument. For example:

-e "ssh -p 2234"

(Note that ssh users can alternately customize site-specific connect options in their .ssh/config file.)

You can also choose the remote shell program using the CDFSYNC_RSH environment variable, which accepts the same range of values as -e.

See also the --blocking-io option which is affected by this option.

--cdfsync-path=PATH

Use this to specify the path to the copy of cdfsync on the remote machine. Useful when it’s not in your path. Note that this is the full path to the binary, not just the directory that the binary is in.

-C, --cvs-exclude

This is a useful shorthand for excluding a broad range of files that you often don’t want to transfer between systems. It uses the same algorithm that CVS uses to determine if a file should be ignored.

The exclude list is initialized to:

RCS SCCS CVS CVS.adm RCSLOG cvslog.* tags TAGS .make.state .nse_depinfo *~ #* .#* ,* _$* *$ *.old *.bak *.BAK *.orig *.rej .del-* *.a *.olb *.o *.obj *.so *.exe *.Z *.elc *.ln core .svn/

then files listed in a $HOME/.cvsignore are added to the list and any files listed in the CVSIGNORE environment variable (all cvsignore names are delimited by whitespace).

Finally, any file is ignored if it is in the same directory as a .cvsignore file and matches one of the patterns listed therein. See the cvs(1) manual for more information.

--exclude=PATTERN

This option allows you to selectively exclude certain files from the list of files to be transferred. This is most useful in combination with a recursive transfer.

You may use as many --exclude options on the command line as you like to build up the list of files to exclude.

See the EXCLUDE PATTERNS section for detailed information on this option.

--exclude-from=FILE

This option is similar to the --exclude option, but instead it adds all exclude patterns listed in the file FILE to the exclude list. Blank lines in FILE and lines starting with ’;’ or ’#’ are ignored. If FILE is - the list will be read from standard input.

--include=PATTERN

This option tells cdfsync to not exclude the specified pattern of filenames. This is useful as it allows you to build up quite complex exclude/include rules.

See the EXCLUDE PATTERNS section for detailed information on this option.

--include-from=FILE

This specifies a list of include patterns from a file. If FILE is - the list will be read from standard input.

--files-from=FILE

Using this option allows you to specify the exact list of files to transfer (as read from the specified FILE or "-" for stdin). It also tweaks the default behavior of cdfsync to make transferring just the specified files and directories easier. For instance, the --relative option is enabled by default when this option is used (use --no-relative if you want to turn that off), all directories specified in the list are created on the destination (rather than being noisily skipped without -r), and the -a (--archive) option’s behavior does not imply -r (--recursive) -- specify it explicitly, if you want it.

The file names that are read from the FILE are all relative to the source dir -- any leading slashes are removed and no ".." references are allowed to go higher than the source dir. For example, take this command:

cdfsync -a --files-from=/tmp/foo /usr remote:/backup

If /tmp/foo contains the string "bin" (or even "/bin"), the /usr/bin directory will be created as /backup/bin on the remote host (but the contents of the /usr/bin dir would not be sent unless you specified -r or the names were explicitly listed in /tmp/foo). Also keep in mind that the effect of the (enabled by default) --relative option is to duplicate only the path info that is read from the file -- it does not force the duplication of the source-spec path (/usr in this case).

In addition, the --files-from file can be read from the remote host instead of the local host if you specify a "host:" in front of the file (the host must match one end of the transfer). As a short-cut, you can specify just a prefix of ":" to mean "use the remote end of the transfer". For example:

cdfsync -a --files-from=:/path/file-list src:/ /tmp/copy

This would copy all the files specified in the /path/file-list file that was located on the remote "src" host.

-0, --from0

This tells cdfsync that the filenames it reads from a file are terminated by a null (’\0’) character, not a NL, CR, or CR+LF. This affects --exclude-from, --include-from, and --files-from. It does not affect --cvs-exclude (since all names read from a .cvsignore file are split on whitespace).

-T, --temp-dir=DIR

This option instructs cdfsync to use DIR as a scratch directory when creating temporary copies of the files transferred on the receiving side. The default behavior is to create the temporary files in the receiving directory.

--compare-dest=DIR

This option instructs cdfsync to use DIR on the destination machine as an additional directory to compare destination files against when doing transfers if the files are missing in the destination directory. This is useful for doing transfers to a new destination while leaving existing files intact, and then doing a flash-cutover when all files have been successfully transferred (for example by moving directories around and removing the old directory, although this skips files that haven’t changed; see also --link-dest). This option increases the usefulness of --partial because partially transferred files will remain in the new temporary destination until they have a chance to be completed. If DIR is a relative path, it is relative to the destination directory (which changes in a recursive transfer).

--link-dest=DIR

This option behaves like --compare-dest but also will create hard links from DIR to the destination directory for unchanged files. Files with changed ownership or permissions will not be linked. Like --compare-dest if DIR is a relative path, it is relative to the destination directory (which changes in a recursive transfer). An example:

    cdfsync -av --link-dest=$PWD/prior_dir host:src_dir/ new_dir/

-z, --compress

With this option, cdfsync compresses any data from the files that it sends to the destination machine. This option is useful on slow connections. The compression method used is the same method that gzip uses.

Note this this option typically achieves better compression ratios that can be achieved by using a compressing remote shell, or a compressing transport, as it takes advantage of the implicit information sent for matching data blocks.

--numeric-ids

With this option cdfsync will transfer numeric group and user IDs rather than using user and group names and mapping them at both ends.

By default cdfsync will use the username and groupname to determine what ownership to give files. The special uid 0 and the special group 0 are never mapped via user/group names even if the --numeric-ids option is not specified.

If a user or group has no name on the source system or it has no match on the destination system, then the numeric ID from the source system is used instead. See also the comments on the "use chroot" setting in the cdfsyncd.conf manpage for information on how the chroot setting affects cdfsync’s ability to look up the names of the users and groups and what you can do about it.

--timeout=TIMEOUT

This option allows you to set a maximum I/O timeout in seconds. If no data is transferred for the specified time then cdfsync will exit. The default is 0, which means no timeout.

--daemon

This tells cdfsync that it is to run as a daemon. The daemon may be accessed using the host::module or cdfsync://host/module/ syntax.

If standard input is a socket then cdfsync will assume that it is being run via inetd, otherwise it will detach from the current terminal and become a background daemon. The daemon will read the config file (cdfsyncd.conf) on each connect made by a client and respond to requests accordingly. See the cdfsyncd.conf(5) man page for more details.

--no-detach

When running as a daemon, this option instructs cdfsync to not detach itself and become a background process. This option is required when running as a service on Cygwin, and may also be useful when cdfsync is supervised by a program such as daemontools or AIX’s System Resource Controller. --no-detach is also recommended when cdfsync is run under a debugger. This option has no effect if cdfsync is run from inetd or sshd.

--address

By default cdfsync will bind to the wildcard address when run as a daemon with the --daemon option or when connecting to a cdfsync server. The --address option allows you to specify a specific IP address (or hostname) to bind to. This makes virtual hosting possible in conjunction with the --config option.

--config=FILE

This specifies an alternate config file than the default. This is only relevant when --daemon is specified. The default is /etc/cdfsyncd.conf unless the daemon is running over a remote shell program and the remote user is not root; in that case the default is cdfsyncd.conf in the current directory (typically $HOME).

--port=PORT

This specifies an alternate TCP port number to use rather than the default port 874.

--blocking-io

This tells cdfsync to use blocking I/O when launching a remote shell transport. If the remote shell is either rsh or remsh, cdfsync defaults to using blocking I/O, otherwise it defaults to using non-blocking I/O. (Note that ssh prefers non-blocking I/O.)

--no-blocking-io

Turn off --blocking-io, for use when it is the default.

--log-format=FORMAT

This allows you to specify exactly what the cdfsync client logs to stdout on a per-file basis. The log format is specified using the same format conventions as the log format option in cdfsyncd.conf.

--stats

This tells cdfsync to print a verbose set of statistics on the file transfer, allowing you to tell how effective the cdfsync algorithm is for your data.

--partial

By default, cdfsync will delete any partially transferred file if the transfer is interrupted. In some circumstances it is more desirable to keep partially transferred files. Using the --partial option tells cdfsync to keep the partial file which should make a subsequent transfer of the rest of the file much faster.

--progress

This option tells cdfsync to print information showing the progress of the transfer. This gives a bored user something to watch. Implies --verbose without incrementing verbosity.

When the file is transferring, the data looks like this:

      782448  63%  110.64kB/s    0:00:04

This tells you the current file size, the percentage of the transfer that is complete, the current calculated file-completion rate (including both data over the wire and data being matched locally), and the estimated time remaining in this transfer.

After the a file is complete, it the data looks like this:

     1238099 100%  146.38kB/s    0:00:08  (5, 57.1% of 396)

This tells you the final file size, that it’s 100% complete, the final transfer rate for the file, the amount of elapsed time it took to transfer the file, and the addition of a total-transfer summary in parentheses. These additional numbers tell you how many files have been updated, and what percent of the total number of files has been scanned.

-P

The -P option is equivalent to --partial --progress. I found myself typing that combination quite often so I created an option to make it easier.

--password-file

This option allows you to provide a password in a file for accessing a remote cdfsync server. Note that this option is only useful when accessing a cdfsync server using the built in transport, not when using a remote shell as the transport. The file must not be world readable. It should contain just the password as a single line.

--bwlimit=KBPS

This option allows you to specify a maximum transfer rate in kilobytes per second. This option is most effective when using cdfsync with large files (several megabytes and up). Due to the nature of cdfsync transfers, blocks of data are sent, then if cdfsync determines the transfer was too fast, it will wait before sending the next data block. The result is an average transfer rate equaling the specified limit. A value of zero specifies no limit.

--write-batch=PREFIX

Generate a set of files that can be transferred as a batch update. Each filename in the set starts with PREFIX. See the "BATCH MODE" section for details.

--read-batch=PREFIX

Apply a previously generated change batch, using the fileset whose filenames start with PREFIX. See the "BATCH MODE" section for details.

--netcdf-only

Only update netCDF files. All other files will be ignored.

--in-place

Update files in place. If a file needs to be updated, cdfsync usually creates a temporary file, writes the old and new data to the temporary file, and then renames the temporary file to the name of the updated file. This can be quite slow for large files that only have to be updated with a few changes since the copying time will be much greater than the time required to transfer the new data.

The --in-place option causes cdfsync to make changes directly to the updated file. The user should be aware that there is a period of time during which the content of the file will be invalid; therefore, this option should only be applied to files that aren’t being used by other processes. It is also possible that an update file will be corrupted if cdfsync is interrupted while the file is being updated in place.

EXCLUDE PATTERNS

The exclude and include patterns specified to cdfsync allow for flexible selection of which files to transfer and which files to skip.

Cdfsync builds an ordered list of include/exclude options as specified on the command line. Cdfsync checks each file and directory name against each exclude/include pattern in turn. The first matching pattern is acted on. If it is an exclude pattern, then that file is skipped. If it is an include pattern then that filename is not skipped. If no matching include/exclude pattern is found then the filename is not skipped.

The filenames matched against the exclude/include patterns are relative to the "root of the transfer". If you think of the transfer as a subtree of names that are being sent from sender to receiver, the root is where the tree starts to be duplicated in the destination directory. This root governs where patterns that start with a / match (see below).

Because the matching is relative to the transfer-root, changing the trailing slash on a source path or changing your use of the --relative option affects the path you need to use in your matching (in addition to changing how much of the file tree is duplicated on the destination system). The following examples demonstrate this.

Let’s say that we want to match two source files, one with an absolute path of "/home/me/foo/bar", and one with a path of "/home/you/bar/baz". Here is how the various command choices differ for a 2-source transfer:

   Example cmd: cdfsync -a /home/me /home/you /dest
   +/- pattern: /me/foo/bar
   +/- pattern: /you/bar/baz
   Target file: /dest/me/foo/bar
   Target file: /dest/you/bar/baz

  Example cmd: cdfsync -a /home/me/ /home/you/ /dest
   +/- pattern: /foo/bar               (note missing "me")
   +/- pattern: /bar/baz               (note missing "you")
   Target file: /dest/foo/bar
   Target file: /dest/bar/baz

  Example cmd: cdfsync -a --relative /home/me/ /home/you /dest
   +/- pattern: /home/me/foo/bar       (note full path)
   +/- pattern: /home/you/bar/baz      (ditto)
   Target file: /dest/home/me/foo/bar
   Target file: /dest/home/you/bar/baz

  Example cmd: cd /home; cdfsync -a --relative me/foo you/ /dest
   +/- pattern: /me/foo/bar      (starts at specified path)
   +/- pattern: /you/bar/baz     (ditto)
   Target file: /dest/me/foo/bar
   Target file: /dest/you/bar/baz

The easiest way to see what name you should include/exclude is to just look at the output when using --verbose and put a / in front of the name (use the --dry-run option if you’re not yet ready to copy any files).

Note that, when using the --recursive (-r) option (which is implied by -a), every subcomponent of every path is visited from the top down, so include/exclude patterns get applied recursively to each subcomponent. The exclude patterns actually short-circuit the directory traversal stage when cdfsync finds the files to send. If a pattern excludes a particular parent directory, it can render a deeper include pattern ineffectual because cdfsync did not descend through that excluded section of the hierarchy.

Note also that the --include and --exclude options take one pattern each. To add multiple patterns use the --include-from and --exclude-from options or multiple --include and --exclude options.

The patterns can take several forms. The rules are:

o

if the pattern starts with a / then it is matched against the start of the filename, otherwise it is matched against the end of the filename. This is the equivalent of a leading ^ in regular expressions. Thus "/foo" would match a file called "foo" at the transfer-root (see above for how this is different from the filesystem-root). On the other hand, "foo" would match any file called "foo" anywhere in the tree because the algorithm is applied recursively from top down; it behaves as if each path component gets a turn at being the end of the file name.

o

if the pattern ends with a / then it will only match a directory, not a file, link, or device.

o

if the pattern contains a wildcard character from the set *?[ then expression matching is applied using the shell filename matching rules. Otherwise a simple string match is used.

o

the double asterisk pattern "**" will match slashes while a single asterisk pattern "*" will stop at slashes.

o

if the pattern contains a / (not counting a trailing /) or a "**" then it is matched against the full filename, including any leading directory. If the pattern doesn’t contain a / or a "**", then it is matched only against the final component of the filename. Again, remember that the algorithm is applied recursively so "full filename" can actually be any portion of a path below the starting directory.

o

if the pattern starts with "+ " (a plus followed by a space) then it is always considered an include pattern, even if specified as part of an exclude option. The prefix is discarded before matching.

o

if the pattern starts with "- " (a minus followed by a space) then it is always considered an exclude pattern, even if specified as part of an include option. The prefix is discarded before matching.

o

if the pattern is a single exclamation mark ! then the current include/exclude list is reset, removing all previously defined patterns.

The +/- rules are most useful in a list that was read from a file, allowing you to have a single exclude list that contains both include and exclude options in the proper order.

Remember that the matching occurs at every step in the traversal of the directory hierarchy, so you must be sure that all the parent directories of the files you want to include are not excluded. This is particularly important when using a trailing ’*’ rule. For instance, this won’t work:

    + /some/path/this-file-will-not-be-found
    + /file-is-included
    - *

This fails because the parent directory "some" is excluded by the ’*’ rule, so cdfsync never visits any of the files in the "some" or "some/path" directories. One solution is to ask for all directories in the hierarchy to be included by using a single rule: --include=’*/’ (put it somewhere before the --exclude=’*’ rule). Another solution is to add specific include rules for all the parent dirs that need to be visited. For instance, this set of rules works fine:

    + /some/
    + /some/path/
    + /some/path/this-file-is-found
    + /file-also-included
    - *

Here are some examples of exclude/include matching:

o

--exclude "*.o" would exclude all filenames matching *.o

o

--exclude "/foo" would exclude a file called foo in the transfer-root directory

o

--exclude "foo/" would exclude any directory called foo

o

--exclude "/foo/*/bar" would exclude any file called bar two levels below a directory called foo in the transfer-root directory

o

--exclude "/foo/**/bar" would exclude any file called bar two or more levels below a directory called foo in the transfer-root directory

o

--include "*/" --include "*.c" --exclude "*" would include all directories and C source files

o

--include "foo/" --include "foo/bar.c" --exclude "*" would include only foo/bar.c (the foo/ directory must be explicitly included or it would be excluded by the "*")

BATCH MODE

Note: Batch mode should be considered experimental in this version of cdfsync. The interface or behavior may change before it stabilizes.

Batch mode can be used to apply the same set of updates to many identical systems. Suppose one has a tree which is replicated on a number of hosts. Now suppose some changes have been made to this source tree and those changes need to be propagated to the other hosts. In order to do this using batch mode, cdfsync is run with the write-batch option to apply the changes made to the source tree to one of the destination trees. The write-batch option causes the cdfsync client to store the information needed to repeat this operation against other destination trees in a batch update fileset (see below). The filename of each file in the fileset starts with a prefix specified by the user as an argument to the write-batch option. This fileset is then copied to each remote host, where cdfsync is run with the read-batch option, again specifying the same prefix, and the destination tree. Cdfsync updates the destination tree using the information stored in the batch update fileset.

The fileset consists of 4 files:

o

<prefix>.cdfsync_argvs command-line arguments

o

<prefix>.cdfsync_flist cdfsync internal file metadata

o

<prefix>.cdfsync_csums cdfsync checksums

o

<prefix>.cdfsync_delta data blocks for file update & change

The .cdfsync_argvs file contains a command-line suitable for updating a destination tree using that batch update fileset. It can be executed using a Bourne(-like) shell, optionally passing in an alternate destination tree pathname which is then used instead of the original path. This is useful when the destination tree path differs from the original destination tree path.

Generating the batch update fileset once saves having to perform the file status, checksum and data block generation more than once when updating multiple destination trees. Multicast transport protocols can be used to transfer the batch update files in parallel to many hosts at once, instead of sending the same data to every host individually.

Example:

   $ cdfsync --write-batch=pfx -a /source/dir/ /adest/dir/
   $ rcp pfx.cdfsync_* remote:
   $ ssh remote cdfsync --read-batch=pfx -a /bdest/dir/
   # or alternatively
   $ ssh remote ./pfx.cdfsync_argvs /bdest/dir/

In this example, cdfsync is used to update /adest/dir/ with /source/dir/ and the information to repeat this operation is stored in the files pfx.cdfsync_*. These files are then copied to the machine named "remote". Cdfsync is then invoked on "remote" to update /bdest/dir/ the same way as /adest/dir/. The last line shows the cdfsync_argvs file being used to invoke cdfsync.

Caveats:

The read-batch option expects the destination tree it is meant to update to be identical to the destination tree that was used to create the batch update fileset. When a difference between the destination trees is encountered the update will fail at that point, leaving the destination tree in a partially updated state. In that case, cdfsync can be used in its regular (non-batch) mode of operation to fix up the destination tree.

The cdfsync version used on all destinations should be identical to the one used on the original destination.

The -z/--compress option does not work in batch mode and yields a usage error. A separate compression tool can be used instead to reduce the size of the batch update files for transport to the destination.

The -n/--dryrun option does not work in batch mode and yields a runtime error.

See http://www.ils.unc.edu/i2dsi/unc_cdfsync+.html for papers and technical reports.

SYMBOLIC LINKS

Three basic behaviors are possible when cdfsync encounters a symbolic link in the source directory.

By default, symbolic links are not transferred at all. A message "skipping non-regular" file is emitted for any symlinks that exist.

If --links is specified, then symlinks are recreated with the same target on the destination. Note that --archive implies --links.

If --copy-links is specified, then symlinks are "collapsed" by copying their referent, rather than the symlink.

cdfsync also distinguishes "safe" and "unsafe" symbolic links. An example where this might be used is a web site mirror that wishes ensure the cdfsync module they copy does not include symbolic links to /etc/passwd in the public section of the site. Using --copy-unsafe-links will cause any links to be copied as the file they point to on the destination. Using --safe-links will cause unsafe links to be omitted altogether.

Symbolic links are considered unsafe if they are absolute symlinks (start with /), empty, or if they contain enough ".." components to ascend from the directory being copied.

DIAGNOSTICS

cdfsync occasionally produces error messages that may seem a little cryptic. The one that seems to cause the most confusion is "protocol version mismatch - is your shell clean?".

This message is usually caused by your startup scripts or remote shell facility producing unwanted garbage on the stream that cdfsync is using for its transport. The way to diagnose this problem is to run your remote shell like this:

   ssh remotehost /bin/true > out.dat

then look at out.dat. If everything is working correctly then out.dat should be a zero length file. If you are getting the above error from cdfsync then you will probably find that out.dat contains some text or data. Look at the contents and try to work out what is producing it. The most common cause is incorrectly configured shell startup scripts (such as .cshrc or .profile) that contain output statements for non-interactive logins.

If you are having trouble debugging include and exclude patterns, then try specifying the -vv option. At this level of verbosity cdfsync will show why each individual file is included or excluded.

EXIT VALUES

0

Success

1

Syntax or usage error

2

Protocol incompatibility

3

Errors selecting input/output files, dirs

4

Requested action not supported: an attempt was made to manipulate 64-bit files on a platform that cannot support them; or an option was specified that is supported by the client and not by the server.

5

Error starting client-server protocol

10

Error in socket I/O

11

Error in file I/O

12

Error in cdfsync protocol data stream

13

Errors with program diagnostics

14

Error in IPC code

20

Received SIGUSR1 or SIGINT

21

Some error returned by waitpid()

22

Error allocating core memory buffers

23

Partial transfer due to error

24

Partial transfer due to vanished source files

30

Timeout in data send/receive

ENVIRONMENT VARIABLES

CVSIGNORE

The CVSIGNORE environment variable supplements any ignore patterns in .cvsignore files. See the --cvs-exclude option for more details.

CDFSYNC_RSH

The CDFSYNC_RSH environment variable allows you to override the default shell used as the transport for cdfsync. Command line options are permitted after the command name, just as in the -e option.

CDFSYNC_PROXY

The CDFSYNC_PROXY environment variable allows you to redirect your cdfsync client to use a web proxy when connecting to a cdfsync daemon. You should set CDFSYNC_PROXY to a hostname:port pair.

CDFSYNC_PASSWORD

Setting CDFSYNC_PASSWORD to the required password allows you to run authenticated cdfsync connections to a cdfsync daemon without user intervention. Note that this does not supply a password to a shell transport such as ssh.

USER or LOGNAME

The USER or LOGNAME environment variables are used to determine the default username sent to a cdfsync server. If neither is set, the username defaults to "nobody".

HOME

The HOME environment variable is used to find the user’s default .cvsignore file.

FILES

/etc/cdfsyncd.conf or cdfsyncd.conf

SEE ALSO

cdfsyncd.conf(5)

DIAGNOSTICS

BUGS

times are transferred as unix time_t values

When transferring to FAT filesystems cdfsync may re-sync unmodified files. See the comments on the --modify-window option.

file permissions, devices, etc. are transferred as native numerical values

see also the comments on the --delete option

CREDITS

cdfsync is distributed under the GNU public license. See the file COPYING for details.

The Web site for cdfsync is at http://www.epic.noaa.gov/epic/software/cdfsync/

Downloads of cdfsync are available at ftp://ftp.epic.noaa.gov/pub/cdfsync

A WEB site for rsync is available at http://rsync.samba.org/. The site includes an FAQ-O-Matic which may cover questions unanswered by this manual page.

The primary ftp site for rsync is ftp://rsync.samba.org/pub/rsync.

We would be delighted to hear from you if you like this program.

This program uses the excellent zlib compression library written by Jean-loup Gailly and Mark Adler.

THANKS

Thanks to Charles Sun at NODC for help on the cdfsync project.

Thanks to Richard Brent, Brendan Mackay, Bill Waite, Stephen Rothwell and David Bell for helpful suggestions, patches and testing of rsync. I’ve probably missed some people, my apologies if I have.

Especial thanks also to: David Dykstra, Jos Backus, Sebastian Krahmer, Martin Pool, Wayne Davison.

AUTHOR

cdfsync was written by Joe Sirott

rsync was originally written by Andrew Tridgell and Paul Mackerras. Many people have later contributed to it.

Mailing lists for support and development are available at http://lists.samba.org