NERSCPowering Scientific Discovery Since 1974

Transfer and Archive Data

Move your data!

Anyone who has data that they need in Netapps should move it to /house or /projectb (for more information on these filesystems go to File storage and I/O). 

/house is 82% full, a great improvement from the 90% last week, but we can do better!  Please archive your data using HPSS and then delete it from /house.  When the disk is this full, performance suffers, so please delete any data that you don't need on /house.  If you can store most of your data on /projectb, that will give the best performance. 

Users have a 40GB quota in their $HOME directories on Genepool/Phoebe.  If you are transferring large files, they should be moved into the $SCRATCH directory where users have a 20TB quota.  If you need more space than this, please submit a JIRA ticket.

Use the fast datatransfer nodes to move data quickly

NERSC has setup 2 fast data transfer nodes to help JGI users move data between file systems and back up data to HPSS. Note that the netapps file system will not be available on Genepool, but the house and new projectb file systems will.  This means that users need to move data out of the netapps onto house or projectb.

Login to the data transfer nodes with the following commands

ssh dtn03.nersc.gov

or

ssh dtn04.nersc.gov

Archiving your data with HPSS

These are some basic examples of data transfer and access with HPSS.

Access Example

Using HSI from a NERSC Production System

All of the NERSC computational systems available to users have the hsi client already installed.  To access the Archive storage system you can type hsi with no arguments:

% hsi

That is, the utility is set up to connect to the Archive system by default.  This is equivalent to typing:

% hsi -h archive.nersc.gov

HSI Usage Example

You can run hsi commands in several different ways:

From a command line:
% hsi
Single-line execution: % hsi "mkdir run123;  cd run123; put bigdata.0311
Read commands from a file: % hsi "in command_file"
Read commands from standard input: % hsi < command_file
Read commands from a pipe: % cat command_file | hsi

Just typing hsi will enter an interactive command shell, placing you in your home directory on the Archive system.  From this shell, you can run the ls command to see your files, cd into storage system subdirectories, put files into the storage system and get files from it.

Specifying local and HPSS file names when storing or retreiving files

The HSI put command stores files from your local file system into HPSS and the get command retrieves them.  The command:

% put myfile

will store the file named "myfile" from you current local file system directory into a file of the same name into your current HPSS directory.  So, in order to store "myfile" into the "run123" subdirectory of your home in HPSS, you can type:

% hsi
A:/home/j/joeuser-> cd run123
A:/home/j/joeuser-> put myfile

or

% hsi "cd run123; put myfile"

The hsi utility uses a special syntax to specify local and HPSS file names when using the put and get commands:

  1. The local file name is always on the left and the HPSS file name is always on the right.
  2. Use a ":" (colon character) to seperate the names

That is:

% put local_file : hpss_file
% get local_file : hpss_file

This format is convenient if you want to store a file named "foo" in the local directory as "foo_2010_09_21" in HPSS:

% hsi "put foo : foo_2010_09_21"

You can also use this method to specify the full or relative pathnames of files in both the local and HPSS file systems:

% hsi "get bigcalc/hopper/run123/datafile.0211 : /scratch2/scratchdirs/joeuser/analysis/data"

Archiving your data with HTAR

HTAR is a command line utility that creates and manipulates HPSS-resident tar-format archive files.  It is ideal for storing groups of files in HPSS.  Since the tar file is created directly in HPSS, it is generally faster and uses less local space than creating a local tar file then storing that into HPSS.  However, there is a file size limit of 64GB for an individual file within the archive (archives themselves can be much larger).  So if you have individual files that are larger than 64GB that you need to back up, use hsi for those files.

Examples of when to use HTAR

HTAR is useful for storing groups of related files that you will probably want to access as a group in the future.  Examples include:

  • archiving a source code directory tree
  • archiving output files from a code simulation run
  • archiving files generated by the run of an experiment

If stored individually, the files will likely be distributed across a collection of tapes, requiring possibly long delays (due to multiple tape mounts) when fetching them from HPSS.  On the other hand, an HTAR archive file will likely be stored on a single tape, requiring only a single tape mount when it comes time to retrieve the data.

HTAR Usage Example

The basic syntax of HTAR is similar to the standard tar utility:

 htar -{c|K|t|x|X} -f tarfile [directories] [files]

As with the standard unix tar utility the "-c" "-x" and "-t" options respectively function to create, extract, and list tar archive files. The "-K" option verifies an existing tarfile in HPSS and the "-X" option can be used to re-create the index file for an existing archive.  
Please note, you cannot add or append files to an existing archive.

Note: when HTAR creates an archive, it places an additional file (with a strange name) at the end of the archive.  Just ignore the file, it is for HTAR interal use and will not be retrieved when you extract the files from the archive.

# Create an archive with directory "nova" and file "simulator"
% htar -cvf nova.tar nova simulator
HTAR: a   nova/                                                                   
HTAR: a   nova/sn1987a
HTAR: a   nova/sn1993j
HTAR: a   nova/sn2005e
HTAR: a   simulator
HTAR: a   /scratch/scratchdirs/joeuser/HTAR_CF_CHK_61406_1285375012
HTAR Create complete for nova.tar. 28,396,544 bytes written for 4 member files, max threads: 4 Transfer time: 0.420 seconds (67.534 MB/s)
HTAR: HTAR SUCCESSFUL      

# Now List the contents
% htar -tf nova.tar
HTAR: drwx------  joeuser/joeuser          0 2010-09-24 14:24  nova/
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1987a
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1993j
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn2005e
HTAR: -rwx------  joeuser/joeuser     398552 2010-09-24 17:35  simulator
HTAR: -rw-------  joeuser/joeuser        256 2010-09-24 17:36  /scratch/scratchdirs/joeuser/HTAR_CF_CHK_61406_1285375012
HTAR: HTAR SUCCESSFUL

# now, as an example, using hsi remove the nova.tar.idx index file from HPSS
# (Note: you generally do not want to do this)
% hsi "rm nova.tar.idx"
...
rm: /home/j/joeuser/nova.tar.idx (2010/09/24 17:36:53 3360 bytes)

# Now try to list the archive contents without the index file:
% htar -tf nova.tar
ERROR: No such file: nova.tar.idx           
ERROR: Fatal error opening index file: nova.tar.idx
HTAR: HTAR FAILED

# Here is how we can rebuild the index file if it is accidently deleted
% htar -Xvf nova.tar
HTAR: i nova                         
HTAR: i nova/sn1987a
HTAR: i nova/sn1993j
HTAR: i nova/sn2005e
HTAR: i simulator
HTAR: i /scratch/scratchdirs/joeuser/HTAR_CF_CHK_61406_1285375012
HTAR: Build Index complete for nova.tar, 5 files 6 total objects, size=28,396,544 bytes
HTAR: HTAR SUCCESSFUL

#
% htar -tf nova.tar
HTAR: drwx------  joeuser/joeuser          0 2010-09-24 14:24  nova/
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1987a
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn1993j
HTAR: -rwx------  joeuser/joeuser    9331200 2010-09-24 14:24  nova/sn2005e
HTAR: -rwx------  joeuser/joeuser     398552 2010-09-24 17:35  simulator
HTAR: -rw-------  joeuser/joeuser        256 2010-09-24 17:36  /scratch/scratchdirs/joeuser/HTAR_CF_CHK_61406_1285375012
HTAR: HTAR SUCCESSFUL

 

For more examples, please go to the HPSS page.