NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

Grid Computing at NERSC: Data Transfer

Table of Contents:


How to transfer data to and from NERSC using grid client tools

GridFTP provides a convenient, high performance transfer mechanism to move data in and out of NERSC. GridFTP is available on the following systems:

System GridFTP hosts Notes
PDSF pdsfgrid.nersc.gov (or pdsfgrid4.nersc.gov)
pdsfgrid1.nersc.gov
pdsfgrid3.nersc.gov
pdsfgrid5.nersc.gov
DaVinci davinci.nersc.gov Recommended host for NGF access
Bassi bassigrid.nersc.gov
Jacquard jacquardgrid.nersc.gov
Franklin franklingrid.nersc.gov For access to Franklin /scratch
Archive HPSS garchive.nersc.gov Uses GSI enabled PFTP
No striped GridFTP support

We suggest using one of the following clients to move your data:

1. globus-url-copy

Syntax: globus-url-copy [-help | -usage] [-version[s]] [-vb] [-dbg] [-b | -a]
                        [-q] [-r] [-rst] [-f <filename>]
                        [-s <subject>] [-ds <subject>] [-ss <subject>]
                        [-tcp-bs <size>] [-bs <size>] [-p <parallelism>]
                        [-notpt] [-nodcau] [-dcsafe | -dcpriv]
                        <sourceURL> <destURL>
In the examples below, we assume that you have installed the Globus client package on your workstation. All commands will be run from the client machine i.e. your workstation.

Initialize your proxy cert:

% grid-proxy-init 

Copy a file from your workstation to davinci:

% globus-url-copy file:///path/to/file \ 
gsiftp://davinci.nersc.gov//path/file 

Copy a file from HPSS archive to your workstation:

% globus-url-copy \
gsiftp://garchive.nersc.gov/path/file file:///path/to/file

Copy a file from PDSF to davinci ("third party copy" without directly logging in to either system)

% globus-url-copy gsiftp://pdsfgrid.nersc.gov/path/to/file \
gsiftp://davinci.nersc.gov/path/to/file 
For more information on globus-url-copy refer to the Globus GridFTP documentation.

2. uberftp

UberFTP provides a rich interactive client for GridFTP. It mimics standard ftp clients in behavior, along with providing some additional features.

To initialize your proxy and connect to davinci:

% grid-proxy-init
% uberftp davinci.nersc.gov
220 davinci.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp>        
To list files in a directory:
uberftp> ls
drwxr-xr-x   2  shreyas  shreyas       27 Apr 26 12:28  .
drwxr-xr-x  19  shreyas  shreyas     4096 Jun 20 15:57  ..
-rw-r--r--   1  shreyas  shreyas   692224 Apr 26 12:28  zebu
-rw-r--r--   1  shreyas  shreyas  2097153 Apr 26 12:28  gnu
To get a file:
uberftp> get davincifile
davincifile:  107 bytes in 0.05 seconds. 2.30 KB/sec
To put a file:
uberftp> put localfile
localfile:  107 bytes in 0.05 seconds. 2.30 KB/sec
To do a third party copy between pdsf and davinci, we issue an lopen, which causes treats the "lopen"ed host as the local filesystem
% grid-proxy-init
% uberftp
uberftp> lopen pdsfgrid.nersc.gov
220 pdsfgrid4.nersc.gov GridFTP Server 2.3 (gcc32dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> open davinci.nersc.gov
220 davinci.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> put pdsffile davincifile
pdsffile:  107 bytes in 0.05 seconds. 2.17 KB/sec
uberftp> get davincifile pdsffile
davincifile:  107 bytes in 0.05 seconds. 2.30 KB/sec
For more details on how to use uberftp refer to the UberFTP user documentation

3. pftp_gsi

pftp_gsi is the recommended client for connecting to HPSS using grid authentication. pftp_gsi is simply a standard HPSS pftp client that uses GSI authentication (instead of the encrypted DCE combo). It supports HPSS parallel streams for high performance transfers.

To set up your proxy and access the archive HPSS system, from one of the NERSC compute platforms:

% grid-proxy-init
% pftp_gsi garchive.nersc.gov
This will log you into the Archive HPSS system with your grid certificate, and you will be able to use the standard PFTP commands to access your data.

GridFTP Performance Optimization and Firewall Considerations

For optimal data transfer perfomance, you may need to tune certain parameters for your network. We have found that using 4 parallel streams with a TCP block size of 1MB works well for moving medium/large files across the WAN. However, actual performance for any given network may require further tuning of these parameters.

Here is an example that uses these parameters for globus-url-copy:

% globus-url-copy -p 4 -tcp-bs 1MB file:///path/to/file \
gsiftp://davinci.nersc.gov//path/file 

Uberftp also supports similar options in the form of the tcpbuf and parallel commands:

uberftp
uberftp> open davinci
220 davinci.nersc.gov GridFTP Server 2.3 (gcc64dbg, 1144436882-63) ready.
230 User shreyas logged in.
uberftp> parallel 4
uberftp> tcpbuf 1048576
TCP buffer set to 1048576 bytes
uberftp> put file
Parameter globus-url-copy flag UberFTP command
TCP buffer size -tcp-bs SIZE
where SIZE includes a value an a unit
eg. -tcp-bs 256KB
tcpbuf SIZE
where SIZE is number of bytes
eg. tcpbuf 262144
Number of Parallel Streams -p N
where N is the number of parallel streams
eg. -p 4
parallel N
where N is the number of parallel streams
eg. parallel 4

Firewall Considerations

If you have problems using GridFTP across a firewall (eg. your transfer hangs without moving any data), you may need to ask your network administrator to open a range of ports in your firewall. Once this is done, you will need to set this range in your environment so that GridFTP clients are aware of this.

For example, to use the port range 60000 to 60064 set the following environment variable, before starting your client:

% export GLOBUS_TCP_PORT_RANGE=60000,60064  

LBNL Home
Page last modified: Thu, 02 Oct 2008 20:09:22 GMT
Page URL: http://www.nersc.gov/nusers/services/Grid/data.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science