Grid Data Transfer
This section describes the tools and services available to move your files across the grid. Specifically, it talks about using GridFTP at the command line to move data. You may find it easier to use Globus Online, which uses the same underlying gridftp but adds reliability, performance, and ease of use.
How to transfer data to and from NERSC using grid client tools
GridFTP provides a convenient, high performance transfer mechanism to move data in and out of NERSC. GridFTP is available on the following systems:
System | GridFTP hosts | Notes |
---|---|---|
PDSF | pdsfgrid.nersc.gov (or pdsfgrid4.nersc.gov) pdsfgrid1.nersc.gov pdsfgrid3.nersc.gov pdsfgrid5.nersc.gov |
|
Datatran | dtn01.nersc.gov dtn01.nersc.gov |
Recommended host for NGF access |
Carver | carvergrid.nersc.gov | |
Franklin | franklingrid.nersc.gov | For access to Franklin /scratch |
Hopper | hoppergrid.nersc.gov | For access to Hopper /scratch |
Euclid | euclid.nersc.gov | |
Archive HPSS | garchive.nersc.gov |
We suggest using one of the following clients to move your data:
1. globus-url-copy
Syntax: globus-url-copy [-help | -usage] [-version[s]] [-vb] [-dbg] [-b | -a]
[-q] [-r] [-rst] [-f <filename>]
[-s <subject>] [-ds <subject>] [-ss <subject>]
[-tcp-bs <size>] [-bs <size>] [-p <parallelism>]
[-notpt] [-nodcau] [-dcsafe | -dcpriv]
<sourceURL> <destURL>
In the examples below, we assume that you have installed the Globus client package on your workstation. All commands will be run from the client machine i.e. your workstation.
Initialize your proxy cert:
% grid-proxy-init
Copy a file from your workstation to datatran (dtn01):
% globus-url-copy file:///path/to/file \
gsiftp://dtn01.nersc.gov//path/file
Copy a file from HPSS archive to your workstation:
% globus-url-copy \
gsiftp://garchive.nersc.gov/path/file file:///path/to/file
Copy a file from PDSF to dtn01 ("third party copy" without directly logging in to either system)
% globus-url-copy gsiftp://pdsfgrid.nersc.gov/path/to/file \
gsiftp://dtn01.nersc.gov/path/to/file
For more information on globus-url-copy refer to the Globus GridFTP documentation.
2. uberftp
UberFTP provides a rich interactive client for GridFTP. It mimics standard ftp clients in behavior, along with providing some additional features.
To initialize your proxy and connect to dtn01:
% grid-proxy-init
% uberftp dtn01.nersc.gov
220 dtn01.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp>
To list files in a directory:
uberftp> ls
drwxr-xr-x 2 shreyas shreyas 27 Apr 26 12:28 .
drwxr-xr-x 19 shreyas shreyas 4096 Jun 20 15:57 ..
-rw-r--r-- 1 shreyas shreyas 692224 Apr 26 12:28 zebu
-rw-r--r-- 1 shreyas shreyas 2097153 Apr 26 12:28 gnu
To get a file:
uberftp> get dtn01
dtn01: 107 bytes in 0.05 seconds. 2.30 KB/sec
To put a file:
uberftp> put localfile
localfile: 107 bytes in 0.05 seconds. 2.30 KB/sec
To do a third party copy between pdsf and dtn01, we issue an lopen, which causes treats the "lopen"ed host as the local filesystem
% grid-proxy-init
% uberftp
uberftp> lopen pdsfgrid.nersc.gov
220 pdsfgrid4.nersc.gov GridFTP Server 2.3 (gcc32dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> open dtn01.nersc.gov
220 dtn01.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> put pdsffile dtn01
pdsffile: 107 bytes in 0.05 seconds. 2.17 KB/sec
uberftp> get dtn01 pdsffile
dtn01: 107 bytes in 0.05 seconds. 2.30 KB/sec
For more details on how to use uberftp refer to the UberFTP user documentation
GridFTP Performance Optimization and Firewall Considerations
For optimal data transfer perfomance, you may need to tune certain parameters for your network. We have found that using 4 parallel streams with a TCP block size of 4MB works well for moving medium/large files across the WAN. However, actual performance for any given network may require further tuning of these parameters.
Here is an example that uses these parameters for globus-url-copy:
% globus-url-copy -p 4 -tcp-bs 4MB file:///path/to/file \
gsiftp://dtn01.nersc.gov//path/file
Uberftp also supports similar options in the form of the tcpbuf and parallel commands:
uberftp
uberftp> open dtn01
220 dtn01.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> parallel 4
uberftp> tcpbuf 1048576
TCP buffer set to 1048576 bytes
uberftp> put file
Parameter | globus-url-copy flag | UberFTP command |
---|---|---|
TCP buffer size | -tcp-bs SIZE where SIZE includes a value an a unit eg. -tcp-bs 256KB |
tcpbuf SIZE where SIZE is number of bytes eg. tcpbuf 262144 |
Number of Parallel Streams | -p N where N is the number of parallel streams eg. -p 4 |
parallel N where N is the number of parallel streams eg. parallel 4 |
Firewall Considerations
If you have problems using GridFTP across a firewall (eg. your transfer hangs without moving any data), you may need to ask your network administrator to open a range of ports in your firewall. Once this is done, you will need to set this range in your environment so that GridFTP clients are aware of this.
For example, to use the port range 60000 to 60064 set the following environment variable, before starting your client:
% export GLOBUS_TCP_PORT_RANGE=60000,60064