NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
Restore navigation column
Back to HPSS Documentation

HPSS Mass Storage

HPSS tape library The High Performance Storage System (HPSS) is a modern, flexible, performance-oriented mass storage system. It has been used at NERSC for archival storage since 1998.

At NERSC, the data in storage doubles almost every year. As of January 27 2009, we have over 3.9 petabytes of data stored in over 66 million files in our HPSS User system (archive.nersc.gov) and 2.8 petabytes of data stored in over 12 million files in our HPSS Backup system (hpss.nersc.gov). HPSS sustains an average transfer rate of more than 100 MB/s, 24 hours per day, with peaks to 450 MB/s.

NERSC has two HPSS systems:

Some characteristics of the NERSC HPSS systems:

Users can access NERSC's HPSS machines through a variety of clients such as hsi, htar, ftp, pftp, and grid clients.

Notices

HPSS Accounts

All NERSC users have an HPSS account for each active username on the computational systems. If you have problems accessing your HPSS account contact the NERSC support office at 1-800-66-NERSC, menu option 2, or 510-486-8612.

Each HPSS account has a storage allocation. You are charged Storage Resource Units (SRUs) for HPSS usage. SRU charges are determined by a formula that takes into account (1) file space used, (2) the number of individual files, and (3) the amount of data transferred to and from HPSS. See HPSS Charging for more information. SRU account balances are available in the NIM web interface.

On May 19, 2003, HPSS quota restrictions went into effect. This means that if a user is out of Storage Resource Units in all their HPSS repositories, that user will be restricted. They will no longer be able to write data to HPSS (although they will continue to be able to read data).

Users can check their HPSS SRU balances by logging into the NERSC Information Management System and looking at the resource "HPSS" in their account usage summary. See also What happens if a repo or user SRU balance is negative?

NERSC HPSS Charging

NERSC uses Storage Resource Units (SRUs) to help manage HPSS storage. The goal is to provide a balanced computing environment with appropriate amounts of storage and adequate bandwidth to keep the compute engines fed with data. Performance and usage tracking allows NERSC to anticipate demand and maintain a responsive storage environment. Storage management also recognizes storage as a distinct resource, in support of an increasing amount of data intensive computing. Finally, storage management and the quota system are intended to encourage efficient usage by the user community.

SRUs are reported and managed via the NERSC Information Management (NIM) system. If a user is out of Storage Resource Units in all their HPSS repositories that user will be restricted so that they can no longer write data to HPSS (although they will continue to be able to read data). See: What happens if a repo or user SRU balance is negative?.

Users can check their HPSS SRU balances by logging into the NERSC Information Management System and looking at the resource "HPSS" in your account usage summary.

A SRU Calculator is available for estimating SRU usage.


Calculating a User's Storage Resource Units

Three measures of use are included in computing SRUs:

  1. Number of files stored (files)
  2. GB of space used in the archive (space)
  3. GB of I/O transferred (I/O).

The formula used to compute the number of SRUs incurred by a user each day is:

   daily user SRUs  =  0.0000393 x files  
		    +  0.0131147 x space (GB)  
		    +  4.0 x I/O (GB)

Where 1GB is 10243 bytes. Yearly usage is the sum of daily usage; the yearly formula is:

   yearly user SRUs  =  0.01436 x Avg files  
                     +  4.787 x Avg space (GB)  
                     +  4.0 x I/O (GB)

For an explanation on how the formula was derived see SRU Formula Coefficients.

Apportioning User SRUs to Repositories: Project Percents

DOE's Office of Science awards Storage Resource Units to each NERSC project every year. The SRUs are deposited into the project's HPSS group account; this group account is called the HPSS repository (or repo). Users charge their HPSS SRU usage to the HPSS repos of which they are members.

If a login name belongs to only one HPSS repo all of its usage is charged to that repo. If a login name belongs to multiple repos its daily charge is apportioned among the repos using the project percents for that login name. Default project percents are assigned based on the size of each repo's storage allocation. The user (only the user, not the project managers) can change her or his project percents by selecting Change SRU Proj Pct from the Actions pull-down list in NIM's Main Menu. Users should try to set project percents to reflect their actual use of HPSS for each of the projects of which they are a member.

Image View "Change Project Percentages"

Note that this is quite different from the way that computational resources are charged.

If a user changes her or his project percents this change will apply to all days in the month the change is made, but not to days prior to the month in which the change is made.

If a login name is added to a new repo or removed from an existing repo

If a login name is added to a new repo or removed from an existing repo the project percents for that user are adjusted based on the size of the SRU allocations of the repos the login name currently belongs to. However, if the user has previously changed the default project percents the relative ratios of these previously set project percents are respected.

For example: say that user u1 belongs to repos r1 and r2 and has changed the default project percents from 50% for each repo to 40% for r1 and to 60% for r2:

   Login   Repo  Repo Allocation   Proj%
    u1      r1     50,000 SRUs      40
    u1      r2     50,000 SRUs      60

Now assume that u1 becomes a new member of repo r3 which has a storage allocation of 100,000 SRUs. The project percents will be adjusted as follows (to preserve the old ratio of 40:60 between r1 and r2 while adding r3 which has the same SRU allocation as r1+r2):

   Login   Repo  Repo Allocation   Proj%
    u1      r1     50,000 SRUs      20
    u1      r2     50,000 SRUs      30
    u1      r3    100,000 SRUs      50

If SRUs are added to or taken from an HPSS repo

If SRUs are added to or taken from an HPSS repo the project percents for the users in that repo are adjusted as needed to reflect the new sizes of each repo's storage allocation unless the user has changed the project percents from their default values (in this case the project percents are not changed).

For example: say that user u2 belongs to repos r1 and r2 and has not changed the default project percents. Repo r2 gets a new infusion of SRUS:

Login   Repo  Old Repo Alloc   Old Proj%   New Repo Alloc   New Proj%
 u2      r1    50,000 SRUs       50           50,000 SRUs     25
 u2      r2    50,000 SRUs       50          150,000 SRUs     75

User Quotas or Allowed Percents

Principal Investigators, PI Proxies and Project Managers can assign Allowed Percents (or user quotas) to each user in their repo. These allowed percents have been operational for MPP and PVP repos for a long time, but have only recently been available for HPSS (with the integration of SRUS in NIM).

The default Allowed Percent is 100% for each user; Project managers can change these as appropriate.

A user's HPSS allowed and used percentages as well as SRU balances are shown in NIM's Account Usage display:

% Used:
the percentage of the repo's HPSS SRU allocation that the login name has used
% Allowed:
the percentage of the repo's HPSS allocation that the login name is authorized to use (also known as the "user quota")
Balance:
the user's SRU balance for this repo. The login name's balance is computed by subtracting the login's usage within that repo from its "Allowed Percentage" of that repo. If the balance of the repository as a whole is less than the login's computed balance, than the lesser number (the repo's balance) is used instead. This user balance is shared with the other repo members.
Image View Account Usage Summary

User Statuses for HPSS

Within NIM the term User Status is used to display two sorts of statuses:

  1. Repository user statuses: In the Account Usage Summary area NIM displays the User Status for each (login, repo) pair. The Repository User Status (for the login name in that repo) is one of:
    Active
    The user is a member of the repo and has a positive user balance in that repo.
    Restricted
    The user is a member of the repo but has a negative user balance in that repo.
    Limited
    The user is no longer a member of the repo but still has limited access to its resources.
    Deleted
    The user has been removed from this repo.
    Admin Member
    The user is an administrative member (PI, PI Proxy or Project Manager) of the project who doesn't use this resource.

    HPSS Repo User Statuses are also displayed in NIM's project / repo display area under the HPSS Usage, User %s tab.

  2. Machine user statuses (for computational resources or HPSS): In the login info area under the Logins by Host tab NIM displays the status for each login name on each machine the user has access two. These Machine User Statuses have the following meaning for HPSS:
    Active
    The login name is in a normal active state with no restrictions. The login name can read and write data.
    Restricted
    The login name is restricted because it has no repo to charge to. The login name can read data from HPSS but cannot write to HPSS.
    Disabled
    The login name has been temporarily disabled.
    Limited
    The login name is restricted because the user is no longer a member of any active repository. The login name can read data from HPSS but cannot write to HPSS. On HPSS a login name remains limited for about one year prior to being deactivated.
    Deactivated
    The login name has been disabled and its HPSS files can be archived to the "crypt". The login name no longer has access to HPSS.
    Crypt
    The login name's files have been moved to the crypt and may be deleted at any time. This has not yet been implemented.
    Deleted
    The login name has been removed from HPSS.

What happens if a repo or user SRU balance is negative?

Accounting information is sent from HPSS to NIM once daily (in the early morning, Pacific Time). At this time actions are taken if a repo or user SRU balance is negative.

If a repo runs out of SRUs all login names associated with it are marked as restricted for that repository (see repository user statuses).

Login names are "HPSS restricted" if all of the repos associated with this login name are restricted (see machine user statuses). HPSS restricted login names are able to read data from HPSS but cannot write any data to HPSS.

Likewise, when a login name goes over its individual "allowed percent" in a given repo, that (login, repo) pair is marked as restricted. The login name is HPSS restricted only if the (login, repo) repository user status is restricted for each repo associated with this login name.

HPSS repos that are negative continue to incur SRU charges every day for each member that has HPSS files or I/O activity. This is because there is a daily charge for files stored within HPSS and for I/O activity. Note that restricted users can still incur I/O charges by reading files. Also, project percents are not adjusted when a repo goes negative. See Calculating a User's SRUs and Apportioning User SRUs to Repos.

Likewise, a user who has gone over her or his allowed percent in a given repo will continue to incur charges in that repo. Project percents are not automatically adjusted when a login name exceeds its allowed percent, although a Project Manger can ask the user to adjust them.

SRU Usage Reports

The following SRU Usage Reports are available in NIM:

Search Daily HPSS User Usage:

Use this query to see the actual number of files and gigabytes a user (login name) has stored in HPSS on a daily basis, I/O transactions to and from HPSS, aw well as the amounts these three usage factors (files, space and I/O) contribute to the user's SRU charge.

From the Search & Reports pull-down menu in NIM's Main Menu frame select Use: Daily User HPSS. You will see the following query:

Image View "Search Daily HPSS Usage"

Note that:

  • HPSS usage data is stored by "Begin Date" and "Last Date"; in every usage record the number of files and Gbytes stored remains constant from Begin Date to End Date. You can think of End Date as an approximation of the Job Date used for MPP usage queries.
  • By Default the user's organization will not be displayed in the report (it will if you deselect the Hide? box).

Image View a sample HPSS Daily and Monthly Usage report.

Search Monthly HPSS User Usage:

Use this query to see the average number of files and gigabytes a user (login name) has stored in HPSS each month, total I/O transactions to and from HPSS by month, aw well as the amounts these three usage factors (files, space and I/O) contribute to the user's SRU charge.

Image View a HPSS Daily and Monthly Usage sample report.

Search Yearly User/Repo Usage:

Use this query to search yearly or year-to-date (for the current fiscal year) usage information. For FY 2003 and later HPSS and MPP usage is available; for FY 2002 HPSS, MPP, and PVP usage is available; for FY 2001 MPP and PVP usage is available. Project Managers can use this query to find users who are within a certain percentage of their user quota (or allowed percent).

From the Search & Reports pull-down menu in NIM's Main Menu frame select Use: Yearly Usr/Repo. You will see the following query (for an HPSS report set the Resource Type to HPSS):

Image View "Search Yearly HPSS Usage"

Image View a Year-to-Date User/Repository Usage sample report.

Search Year to Date Repository Usage:

Use this query to get a summary usage report of all NERSC repositories. From the Search & Reports pull-down menu in NIM's Main Menu frame select Use: Yearly Repo. For an HPSS report set the Resource Type to HPSS. For more information see Year to Date Repository Usage Query and Year to Date Repository Usage Report.

HPSS Passwords

The HPSS systems use NIM and the NERSC LDAP server for user authentication of HPSS sessions. HPSS uses information in NIM for your HPSS account (such as your moniker, UID, GID, and home directory within HPSS). However, HPSS does not use your NIM password. You will use NIM to set a separate password, called a "HPSS token", for use with HPSS.

The HPSS token does not currently expire and users may generate new tokens as often as they wish. Old tokens will still be honored. If a user wishes to disable all previously generated tokens for security reasons, they should contact NERSC User Services and request that their account be security disabled. This will invalidate all user passwords to include HPSS tokens within NIM.

Because HPSS passwords do not expire, it is only necessary to generate a password one-time for continued use of HPSS. This password may be placed in a .netrc file for use by HSI, HTAR, pftp, and most FTP clients to prevent the username/password challenge.

Accessing HPSS from a system on the NERSC network

If using a NERSC provided client (HSI, HTAR, PFTP; or FTP on the NERSC compute platforms), the first time you connect to the either HPSS system, you will be prompted for your NIM password and an authentication token will be generated and stored in $HOME/.netrc. After completing this step, you will be able to connect to HPSS without typing a password.

Please note that if you have an existing $HOME/.netrc file or you are having problems connecting to either HPSS system, you should remove any entries referring to 'machine archive' or 'machine archive.nersc.gov' for the HPSS User System -or- 'machine hpss' or 'machine hpss.nersc.gov' for the HPSS Backup System. After you remove the entries, try connecting to the HPSS system again with your NIM password and a new entry/token will be placed in your $HOME/.netrc. If the problem persists, contact NERSC account support.

Accessing HPSS from a system outside the NERSC network

Log into NIM and select "Generate an HPSS token" from the "Actions" menu. For example, see the screenshot below:

This will provide you with a token, an encrypted string, in the blue highlighted box that may be used by the user on any machine in the NERSC network by any supported HPSS client (e.g. FTP, pftp, HSI, or HTAR). See below for screenshot showing generated token.

Below the blue highlighted box you are also provided with a sample .netrc file with your updated password. Creating a .netrc as shown and placing it in your home directory will enable pftp, HSI, HTAR, and some FTP clients to read it upon starting a new session to HPSS and avoid username/password challenge.

To generate a string for access to NERSC HPSS from outside the NERSC network, log into NIM and select "Generate an HPSS token" from the "Actions" menu. Ignore the password provided and select "Please use this link to specify a different IP address". Then enter the IP address of the system you wish to connect to HPSS from. Note, that it prefills the box with the IP address that the browser is running on and this may not be the system you intend to access HPSS from. Enter the correct IP address and select "Generate Token". See the screenshot below showing the screen to enter the IP address:

This will provide you with a password, an encrypted string, in a blue highlighted box that may be used by the user on any machine within the same class C network as the IP address provided. You may place the encrypted string in a .netrc file for HSI or HTAR to read. This will avoid username/password challenge. A sample .netrc file with your correct password is provided below the blue highlighted box.

Accessing HPSS

Once you have successfully generated a HPSS token, you can access either HPSS system using the HSI and HTAR utilities, NERSC's PFTP utility, or clients that use the FTP protocol. The HPSS User system (archive.nersc.gov) is also capable of performing GSI authentication to its parallel FTP daemon allowing grid-enabled clients to access it.

HPSS cannot be accessed via SSH.

Access from NERSC platforms

HSI, HTAR, and PFTP are available on NERSC platforms. The user HPSS system is archive and needs to be specified on the command line as "archive" when using the HSI or HTAR utilities at NERSC, like such:

hsi -h archive.nersc.gov command

The HPSS Backup system (hpss.nersc.gov) should be specified as "hpss.nersc.gov" when using the HSI utility at NERSC.

HPSS can be accessed interactively and used in batch scripts.

HSI, HTAR, pftp and some FTP clients will look for a .netrc file in your HOME directory. The file should have stanzas for each system that provide your login username and password. This will enable automated authentication or access to HPSS. You will not be prompted for a username/password pair. A sample file showing entries for the HPSS Backup system (hpss.nersc.gov) is provided below.

# comment
 
machine hpss
  login franky
  password 02S&feVYA!UMR_aGljaw....Bx22w%%wp((ubVDfIn7FG2W50jSg== 
 
machine hpss.nersc.gov
  login franky
  password 02S&feVYA!UMR_aGljaw....Bx22w%%wp((ubVDfIn7FG2W50jSg== 

Where the password argument is your HPSS generated authentication token. Ensure there are no group or other permissions on your .netrc file (mode 600 is appropriate).

Access from outside NERSC

The primary user HPSS system is known as "archive.nersc.gov". All new users and most current users utilize the archive system for their archive storage needs.

The backup NERSC HPSS system is "hpss.nersc.gov" and is sometimes referred to as the "regent" system. The backup system is used for system backups, although it does contain some older user data.

archive.nersc.gov can be reached using the hsi and htar utilities and ftp clients. The HSI and HTAR utilities are available for download and use by NERSC users. If you do not have HSI/HTAR behind a firewall, you may see a significant performance improvement by disabling firewall mode that is on by default. To disable firewall mode, see the README document that comes with your downloaded HSI bundle.

Accessing HPSS - HSI

HSI is a flexible and powerful interface utility to HPSS. The HSI commands are similar to those in ftp and pftp (e.g., put and mput) and UNIX (e.g., mv, mkdir, rm, cp, cd). HSI also has commands similar to those in CFS. HSI can be used both interactively or in batch scripts.

A related utility, HTAR, is useful for archiving multiple files to HPSS without using the intermediate local file storage that would be needed if one first used the tar utility followed by HSI.

There are man pages for HSI on production NERSC computers.

For documentation on HSI (version 3.4+) see the HSI 3.4 documentation.

Authentication

Authentication on the HPSS systems is accomplished by using your NIM username and a HPSS authentication token, or by placing the NIM username and HPSS authentication token in a .netrc file in your HOME directory. See the HPSS Passwords page for details.


Connecting with HSI

To connect to the main user system (archive):

% hsi -h archive.nersc.gov

NERSC's other HPSS system - the original HPSS system at NERSC - is named "hpss". (It seemed like a good idea at the time, but now causes confusion. It is known internally as "regent"). This system is now used for backups, and does not offer the same capacity and performance as "archive." However, it does contain some older user data. To connect to it, use the command:

% hsi -h hpss.nersc.gov

Starting and Using HSI

HSI can accept input several different ways; some examples:

From a command line: hsi
Single-line execution: hsi "mkdir foo; cd foo; put data_file"
From a command file: hsi "in command_file"

HSI can also read from standard input and write to standard output using pipes.

For "get" and "put" operations, HSI uses a special syntax to identify and separate the local and HPSS file names:

  1. The local file name is always on the left, and the HPSS file name is always on the right.
  2. A ":" (colon character) always separates the local pathname from the HPSS pathname, and the colon character must be surrounded by whitespace.

Examples:

% put local_file : hpss_file  
% get local_file : hpss_file  

Recursive operations are allowed for the following commands:

    cget, chgrp, chmod, chown, cput
    delete, get, ls, mdelete, mget, mput
    put, rm, stage, touch

Wildcards are supported.


Frequently Used Commands

HSI's command set is rich, and will look familiar to users of UNIX, FTP, and other storage utilities. A small set of commands will satisfy most user storage needs.

Short List of HSI Commands by Function

HPSS File and Directory Commands

Command Function
cd Change current directory
get, mget Copy one or more HPSS-resident files to local files
cp Copy a file within HPSS
rm mdelete Remove one or more files from HPSS
ls List a directory
put, mput Copy one or more local files to HPSS
pwd Print current directory
mv Rename an HPSS file
mkdir Create an HPSS directory
rmdir Delete an HPSS directory

Local File and Directory Commands

Command Function
lcd Change local directory
lls List local directory
lmkdir Make a local directory
lpwd Print current local directory
command Issue shell command

File Administrative Information

Command Function
chmod Change permissions of file or directory

Miscellaneous HSI commands

Command Function
help Display help information
quit, exit, end Terminate HSI
in Read commands from a local file
out Write HSI output to a local file
log Write all HSI commands and responses to a local log file
prompt Toggles HSI prompting for mget, mput, and mdelete

Accessing HPSS - HTAR

While HSI is a very fast and flexible tool for dealing with large data transfers, it can be slow for transferring a large number of small files or data from a stream buffer. For these cases the HTAR utility performs well. HTAR

Connecting with HTAR

Connections with HTAR operate in much the same way as in HSI. See the HSI page for information about connecting to NERSC's HPSS systems. HTAR is available on NERSC computers and is available for downloading for NERSC users to use from their local machines.

The target HPSS system is specified with the -Hserver option, e.g.:

% htar -Hserver=archive.nersc.gov -tvf blah.tar

Using HTAR

For details, see the HTAR man page or the HTAR home page.

HTAR operates much the same as the unix tar command but with the tarfile archive residing in HPSS storage. Archive creation "-c" puts data into HPSS and archive extraction brings data to your local machine.

Basic Syntax:

The core syntax for HTAR is analogous to unix tar:

	htar -{c|K|t|x|X} -f tarfile [directories],[files] 

As in the unix tar command the "-c" "-x" and "-t" options respectively function to create, extract, and list tar files. The "-K" option verifies an existing tarfile in HPSS.

One useful feature of HTAR is the creation of indexes ".idx" files. These files provide a means to find the location of files prior to retrieval of a tarfile. Using the "-X" option will cause htar to build an index file for a specifed standard tar file, so it can subsequently be used with htar.

Frequently Used HTAR Invocations

A small set of cases will satisfy most user storage needs.

Command Function
htar -cvf dirs.tar directory1 directory2 Create dirs.tar in HPSS containing directory1 and directory2;
provide verbose listing of actions while processing.
htar -cf files.tar file1 file2 Create files.tar in HPSS containing file1 and file2.
htar -tvf files.tar List contents of files.tar in HPSS.
htar -xvf files.tar Extract contents of files.tar in HPSS.
htar -Xf files.tar Build index file "files.tar.idx" for tar file files.tar.

The HTAR man page has more detailed information on this utility.

Accessing HPSS - ftp/pftp

Files can be transferred to and from HPSS via the standard internet protocol ftp and HPSS pftp utility. There is no sftp (secure ftp) or scp access.

As standard ftp clients only support authentication via the transmission of unencrypted passwords, which NERSC does not permit, special procedures must be used with ftp and pftp. The procedures are described below.

PFTP

PFTP is a variant of ftp which is available on NERSC systems. It is better than ftp for large file transfers (> 100 MB) because it is multi-threaded and has some tuning parameters available for transfers. PFTP has the advantage of being compatible with NERSC "sleepers," which will gracefully suspend connections when HPSS is down or unavailable.


ftp/pftp Authentication

In order to access the NERSC HPSS systems, users will need to generate and use a special encrypted password called a HPSS authentication token.

For details on generating the token, see HPSS password information.

Using HPSS from Batch Jobs

Once you are set up for automatic authentication (see the sections on HSI, HTAR, and ftp/pftp) you can access HPSS from within batch scripts.

HSI will accept one-line commands on the HSI command line, e.g.:

hsi put filename

HSI, ftp, and pftp read from Standard Input (STDIN) and a list of commands can be placed in a text file (script) and redirected into the given utility, e.g.:

ftp < file_with_ftp_commands 

"Here" Documents

Another method uses what are called "Here Documents," in which the commands are embedded in the batch script rather than in a separate file external to the main script. The start of a "here-doc" block in a script is signalled by the presence of double angle brackets: << followed by a identifying tag. Lines up to the line containing the tag are treated as if they had been typed at the command prompt.

Here is a simple script which performs an ftp file transfer:

pftp -v -i archive <<_EOS
cd my_HPSS_directory
mget data*
quit
_EOS

This example will execute the FTP commands between the "_EOS" strings.

Accessing HPSS - Sleepers

Sleepers are only available using the HSI or PFTP clients from NERSC production machines.

When scheduled maintenance or unexpected events necessitate taking HPSS down, "sleepers" are enabled. This causes all jobs attempting to use HPSS to wait. Usually this causes no problems for these jobs, which resume safely when sleepers are removed. However, users may wish to test for HPSS system availability, and take alternate actions based on this, so a way to detect sleepers is available.

Testing for sleepers can be accomplished by using the "hpss_avail" utility, which is available on all NERSC supercomputers. This utility takes a single argument, which may be "archive", "hpss", or "help"; case is not significant. Any other argument, or none, will result in usage text being returned. The "help" argument will result in more detailed help text. The utility returns its result in the predefined shell variable "status ($? in some shells)". It may be tested, used in a subsequent shell command, or output. Its value will persist only until the next shell command is executed, and then it will be overwritten by the results of that next command. Here are two examples of querying a system and printing a message based on the returned status value. The first uses the C Shell and the second the Korn shell.

     #!/bin/csh
     
     hpss_avail archive; set READY=$status
     if ($READY == 0) then
        echo "ARCHIVE up and available"
     else
        echo "ARCHIVE is unavailable"
     endif
     

     #!/usr/bin/ksh
     
     hpss_avail archive
     READY=$?
     if [ $READY -eq 0 ]; then
             echo "ARCHIVE up and available"
     else
             echo "ARCHIVE is unavailable"
     fi
     

Possible alternative actions to take when sleepers are enabled might include (1) moving files to alternate file systems, such as $HOME; or (2) changing file names to prevent overwriting or name collisions by subsequent file creations.

Accessing HPSS - Usage Advice and Examples

This section advises and demonstrates some useful techniques for using HSI and ftp/pftp, including their use in batch scripts.

  1. Some Advice on Efficient Use of HPSS
  2. A Complete Batch Script Using PFTP
  3. A Complete Batch Script Using HSI

Some Advice on Efficient Use of HPSS

Accessing HPSS in Batch Jobs
Each HPSS read request can involve a storage library mounting a tape, which may take an arbitrary amount of time, depending on how many requests that library is currently servicing. Doing HPSS reads in a batch job can stall the entire ensemble of processors dedicated to the job. A better strategy is to read any files needed by batch jobs in advance of the job's execution; NERSC provides a special batch job class, named "xfer" for HPSS file transfers. Files in user $SCRATCH space on NERSC supercomputers will likely persist there for several weeks, so pre-reading can be done in advance of submitting the batch job that will use them. Writing files into HPSS generally takes less time than reading them, since they are written into HPSS disks, and transferred to tape later.

Accessing HPSS in a Single Session
Each invocation of HSI or ftp/pftp constitutes a separate "session" on the NERSC servers, and each session involves startup and shutdown overhead. It is more efficient to perform multiple operations in a single session, than to use multiple sessions each to perform a single operation. This means is it inefficient call either of these utilities in a scripted loop; it's better to generate a list of files in a loop, and use that for a set of commands in a single session. Command files can be used with HSI via the in command, as documented in the HSI User Guide.

Ordering Multiple-File Reads
Files and directories located logically close together within HPSS may reside on different tapes, so multiple-file read commands can incur multiple tape-mount delays. A useful technique for reading many files in a single session is to first use the HSI command "ls -P" to produce a list of the required directories and/or files, and direct the command's output into a file. Sort that output file on the last two fields in the output lines, i.e., tape position, and tape identifier, respectively. Perform the sorts to group the file lines by tape ID, and in ascending positional order for each tape. Edit the file to remove extraneous lines and fields, and perform the get operations on the desired files in their sorted ordering. The resulting command input file can be used with the HSI in command in a single session, and will minimize tape delays and decrease overall access time.

Aggregating File Collections
File can be aggregated into collections with the HTAR utility, allowing more efficient access to members of the collection. HTAR writes tar-like archive files directly into HPSS, with a companion index file to each archive. This allows subsequent reads of any subset of an htar archive's contents with only a single tape mount. File sets that were written unaggregated can be re-written with htar after being read. The cost of this rewriting is the extra storage resources used, since the original files are not removed.

Example 1.   Complete Batch Script Using PFTP

This example shows a batch script with pftp actions in it. In this more complex example, we show the use of both single and multiple-file movement commands, as well as directory change commands. Here, also, we show the "+" character used to bracket a "here document." This example also assumes that you have a ".netrc" file in your home directory with the appropriate encrypted password combination.

     #!/bin/csh   
      
     # First, copy the source from the submitting directory 
     pftp -i -v archive <<+
     cd my_HPSS_directory
     mget data*
     get source.f 
     quit
     +
      
     ./myprog data outfile
      
     # Save the output file in HPSS.
     pftp -i -v archive <<+
     cd my_HPSS_directory
     put outfile
     mput restart*
     quit
     +
     exit
     

Example 2.   Complete Batch Script Using HSI

This example shows a script containing HSI actions. In this example, we show the use of HSI commands that accomplish the same actions ftp does in Example 2, above. Note that in this case, a single-line command is used, so no "here-doc" is needed. This simplifies the script, and demonstrates some of HSI's advantages over pftp or ftp. This script assumes that you have previously interactively logged into HSI at least once to encrypt your username/password.

     #!/bin/csh
      
     # First, copy the data and program source from the
     # submitting directory
      
     hsi -h archive.nersc.gov "cd my_HPSS_directory; get data* source.f"
      
     ./myprog data outfile
      
     # Save the output file in HPSS.
      
     hsi -h archive.nersc.gov "cd my_HPSS_directory; put outfile; \
        put restart*"
      
     exit
     

Note that in the above, the individual hsi commands are separated by semicolons, (;) and the set of commands is contained in quotes, ("). The semicolons are necessary, and are currently the only allowed command separator. The quotes are required to prevent shell interpretation of wild card characters, and are recommended for general safety in one-liners. Note that the suppression of shell interpretation prevents the effective use of wild-card file and directory specifications in one-liners.

Unlike an interactive HSI session, no termination command (e.g. exit, quit, etc.) is needed in a one-liner.

In addition to one-line commands, HSI can also take input command sets from files. For more information on this see the HSI Documentation.

For New HPSS Users

If you are a new user to of NERSC's HPSS system for storing your data, there are a few things you need to know. The questions and answers below will guide you through the process of starting to use HPSS.

How do I get an HPSS account?

Your HPSS user name will be your NIM user name, but not all users have home directories. If you desire access to either HPSS system, call the NERSC Account Support staff at 1-800-66-NERSC, menu option 2, to create your home directory.

How do I change set my HPSS password?

Login to NIM and select "Actions" and "Generate HPSS Token". This will generate a new encrypted password for access to HPSS, but will not actually change the stored key in NIM/LDAP. If you have security reasons for needing to change this, contact NERSC Account Support to security disable your account and we will manually reset your stored key which will invalidate all previously generated passwords.

How do I access HPSS?

You can access NERSC's HPSS systems from NERSC production platforms and from any machine that supports HSI/pftp/ftp. and the HPSS User system (archive.nersc.gov) can be accessed by gridFTP clients as well.

Why doesn't my username and password work with ftp or pftp?

For security reasons, NERSC no longer supports using clear text names and passwords for pftp or ftp. You can find out how to generate a HPSS authentication token by seeing details at HPSS passwords.


LBNL Home
Page last modified: Thu, 23 Jun 2005 22:33:18 GMT
Page URL: http://www.nersc.gov/nusers/systems/hpss/print.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science