NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

Accessing HPSS - Usage Advice and Examples

This section advises and demonstrates some useful techniques for using HSI and ftp/pftp, including their use in batch scripts.

  1. Some Advice on Efficient Use of HPSS
  2. A Complete Batch Script Using PFTP
  3. A Complete Batch Script Using HSI

Some Advice on Efficient Use of HPSS

Accessing HPSS in Batch Jobs
Each HPSS read request can involve a storage library mounting a tape, which may take an arbitrary amount of time, depending on how many requests that library is currently servicing. Doing HPSS reads in a batch job can stall the entire ensemble of processors dedicated to the job. A better strategy is to read any files needed by batch jobs in advance of the job's execution; NERSC provides a special batch job class, named "xfer" for HPSS file transfers. Files in user $SCRATCH space on NERSC supercomputers will likely persist there for several weeks, so pre-reading can be done in advance of submitting the batch job that will use them. Writing files into HPSS generally takes less time than reading them, since they are written into HPSS disks, and transferred to tape later.

Accessing HPSS in a Single Session
Each invocation of HSI or ftp/pftp constitutes a separate "session" on the NERSC servers, and each session involves startup and shutdown overhead. It is more efficient to perform multiple operations in a single session, than to use multiple sessions each to perform a single operation. This means is it inefficient call either of these utilities in a scripted loop; it's better to generate a list of files in a loop, and use that for a set of commands in a single session. Command files can be used with HSI via the in command, as documented in the HSI User Guide.

Ordering Multiple-File Reads
Files and directories located logically close together within HPSS may reside on different tapes, so multiple-file read commands can incur multiple tape-mount delays. A useful technique for reading many files in a single session is to first use the HSI command "ls -P" to produce a list of the required directories and/or files, and direct the command's output into a file. Sort that output file on the last two fields in the output lines, i.e., tape position, and tape identifier, respectively. Perform the sorts to group the file lines by tape ID, and in ascending positional order for each tape. Edit the file to remove extraneous lines and fields, and perform the get operations on the desired files in their sorted ordering. The resulting command input file can be used with the HSI in command in a single session, and will minimize tape delays and decrease overall access time.

Aggregating File Collections
File can be aggregated into collections with the HTAR utility, allowing more efficient access to members of the collection. HTAR writes tar-like archive files directly into HPSS, with a companion index file to each archive. This allows subsequent reads of any subset of an htar archive's contents with only a single tape mount. File sets that were written unaggregated can be re-written with htar after being read. The cost of this rewriting is the extra storage resources used, since the original files are not removed.

Example 1.   Complete Batch Script Using PFTP

This example shows a batch script with pftp actions in it. In this more complex example, we show the use of both single and multiple-file movement commands, as well as directory change commands. Here, also, we show the "+" character used to bracket a "here document." This example also assumes that you have a ".netrc" file in your home directory with the appropriate encrypted password combination.

     #!/bin/csh   
      
     # First, copy the source from the submitting directory 
     pftp -i -v archive <<+
     cd my_HPSS_directory
     mget data*
     get source.f 
     quit
     +
      
     ./myprog data outfile
      
     # Save the output file in HPSS.
     pftp -i -v archive <<+
     cd my_HPSS_directory
     put outfile
     mput restart*
     quit
     +
     exit
     

Example 2.   Complete Batch Script Using HSI

This example shows a script containing HSI actions. In this example, we show the use of HSI commands that accomplish the same actions ftp does in Example 2, above. Note that in this case, a single-line command is used, so no "here-doc" is needed. This simplifies the script, and demonstrates some of HSI's advantages over pftp or ftp. This script assumes that you have previously interactively logged into HSI at least once to encrypt your username/password.

     #!/bin/csh
      
     # First, copy the data and program source from the
     # submitting directory
      
     hsi archive "cd my_HPSS_directory; get data* source.f"
      
     ./myprog data outfile
      
     # Save the output file in HPSS.
      
     hsi archive "cd my_HPSS_directory; put outfile; \
        put restart*"
      
     exit
     

Note that in the above, the individual hsi commands are separated by semicolons, (;) and the set of commands is contained in quotes, ("). The semicolons are necessary, and are currently the only allowed command separator. The quotes are required to prevent shell interpretation of wild card characters, and are recommended for general safety in one-liners. Note that the suppression of shell interpretation prevents the effective use of wild-card file and directory specifications in one-liners.

Unlike an interactive HSI session, no termination command (e.g. exit, quit, etc.) is needed in a one-liner.

In addition to one-line commands, HSI can also take input command sets from files. For more information on this see the HSI Documentation.


LBNL Home
Page last modified: Tue, 14 Jun 2005 20:42:13 GMT
Page URL: http://www.nersc.gov/nusers/systems/hpss/usage_examples.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science