NERSCPowering Scientific Discovery Since 1974

Grid Data Transfer

| Tags: Data Transfer, Grid

This section describes the tools and services available to move your files across the grid. Specifically, it talks about using GridFTP at the command line to move data. You may find it easier to use Globus Online, which uses the same underlying gridftp but adds reliability, performance, and ease of use.

How to transfer data to and from NERSC using grid client tools

GridFTP provides a convenient, high performance transfer mechanism to move data in and out of NERSC. GridFTP is available on the following systems:

System GridFTP hosts Notes
PDSF pdsfgrid.nersc.gov (or pdsfgrid4.nersc.gov)
pdsfgrid1.nersc.gov
pdsfgrid3.nersc.gov
pdsfgrid5.nersc.gov

Datatran dtn01.nersc.gov
dtn01.nersc.gov
Recommended host for NGF access
Carver carvergrid.nersc.gov
Franklin franklingrid.nersc.gov For access to Franklin /scratch
Hopper hoppergrid.nersc.gov For access to Hopper /scratch
Euclid euclid.nersc.gov  
Archive HPSS garchive.nersc.gov

We suggest using one of the following clients to move your data:

1. globus-url-copy

Syntax: globus-url-copy [-help | -usage] [-version[s]] [-vb] [-dbg] [-b | -a]
[-q] [-r] [-rst] [-f <filename>]
[-s <subject>] [-ds <subject>] [-ss <subject>]
[-tcp-bs <size>] [-bs <size>] [-p <parallelism>]
[-notpt] [-nodcau] [-dcsafe | -dcpriv]
<sourceURL> <destURL>

In the examples below, we assume that you have installed the Globus client package on your workstation. All commands will be run from the client machine i.e. your workstation.

Initialize your proxy cert:

% grid-proxy-init

Copy a file from your workstation to datatran (dtn01):

% globus-url-copy file:///path/to/file \
gsiftp://dtn01.nersc.gov//path/file

Copy a file from HPSS archive to your workstation:

% globus-url-copy \
gsiftp://garchive.nersc.gov/path/file file:///path/to/file

Copy a file from PDSF to dtn01 ("third party copy" without directly logging in to either system)

% globus-url-copy gsiftp://pdsfgrid.nersc.gov/path/to/file \
gsiftp://dtn01.nersc.gov/path/to/file

For more information on globus-url-copy refer to the Globus GridFTP documentation.

2. uberftp

UberFTP provides a rich interactive client for GridFTP. It mimics standard ftp clients in behavior, along with providing some additional features.

To initialize your proxy and connect to dtn01:

% grid-proxy-init
% uberftp dtn01.nersc.gov
220 dtn01.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp>

To list files in a directory:

uberftp> ls
drwxr-xr-x 2 shreyas shreyas 27 Apr 26 12:28 .
drwxr-xr-x 19 shreyas shreyas 4096 Jun 20 15:57 ..
-rw-r--r-- 1 shreyas shreyas 692224 Apr 26 12:28 zebu
-rw-r--r-- 1 shreyas shreyas 2097153 Apr 26 12:28 gnu

To get a file:

uberftp> get dtn01
dtn01: 107 bytes in 0.05 seconds. 2.30 KB/sec

To put a file:

uberftp> put localfile
localfile: 107 bytes in 0.05 seconds. 2.30 KB/sec

To do a third party copy between pdsf and dtn01, we issue an lopen, which causes treats the "lopen"ed host as the local filesystem

% grid-proxy-init
% uberftp
uberftp> lopen pdsfgrid.nersc.gov
220 pdsfgrid4.nersc.gov GridFTP Server 2.3 (gcc32dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> open dtn01.nersc.gov
220 dtn01.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> put pdsffile dtn01
pdsffile: 107 bytes in 0.05 seconds. 2.17 KB/sec
uberftp> get dtn01 pdsffile
dtn01: 107 bytes in 0.05 seconds. 2.30 KB/sec

For more details on how to use uberftp refer to the UberFTP user documentation

GridFTP Performance Optimization and Firewall Considerations

For optimal data transfer perfomance, you may need to tune certain parameters for your network. We have found that using 4 parallel streams with a TCP block size of 4MB works well for moving medium/large files across the WAN. However, actual performance for any given network may require further tuning of these parameters.

Here is an example that uses these parameters for globus-url-copy:

% globus-url-copy -p 4 -tcp-bs 4MB file:///path/to/file \
gsiftp://dtn01.nersc.gov//path/file

Uberftp also supports similar options in the form of the tcpbuf and parallel commands:

uberftp
uberftp> open dtn01
220 dtn01.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> parallel 4
uberftp> tcpbuf 1048576
TCP buffer set to 1048576 bytes
uberftp> put file
Parameter globus-url-copy flag UberFTP command
TCP buffer size -tcp-bs SIZE
where SIZE includes a value an a unit
eg. -tcp-bs 256KB
tcpbuf SIZE
where SIZE is number of bytes
eg. tcpbuf 262144
Number of Parallel Streams -p N
where N is the number of parallel streams
eg. -p 4
parallel N
where N is the number of parallel streams
eg. parallel 4

Firewall Considerations
If you have problems using GridFTP across a firewall (eg. your transfer hangs without moving any data), you may need to ask your network administrator to open a range of ports in your firewall. Once this is done, you will need to set this range in your environment so that GridFTP clients are aware of this.

For example, to use the port range 60000 to 60064 set the following environment variable, before starting your client:

% export GLOBUS_TCP_PORT_RANGE=60000,60064