NERSCPowering Scientific Discovery Since 1974

Mistakes to Avoid

There are a number of workflows that should be avoided as much as possible:

Small Files

Large tape storage systems do not work well with small files:

  • Storing large numbers of small files may spread them across dozens of hundreds of tapes
  • Tape is sequential media so for each file its tape must be mounted in tape drive and positioned to the start of file for read.  This operation takes time...
  • Mounting dozens of tapes and then seeking to particular locations on tape can take a long time and impair usability for others
  • Store small files as aggregates with HTAR.
    • Large HTAR aggrregates end up on fewer tapes
    • HTAR index speeds up member file retrieval
  • Requests for large numbers of small files can be ordered to mitigate performance impact

Recursive/Unordered Requests

Using HSI for recursive storage and retrieval is almost always non-optimal:

  • Recursively storing a directory tree is likely to store a lot of small files across a large number of tapes
  • Recursive file retrieval is likely to cause excessive tape mount and positioning activity.  This is not only slow but ties up the system for other users.

Instead it is better to use HTAR instead of recursive HSI.

For retrieving a lot of files ordering your read requests by tape and position on the tape is the most efficient method.  An example script showing how this might be done can be found in the Usage Examples at HSI Tape Ordering Script.

Stream Data via Unix Pipelines

Unix pipelines are often used to alleviate the need for spool area for writing large archive file.  This approach has some weaknesses:

  • Pipelines break during transient network issues
  • Pipelines fail to notify HPSS of data size
    • Data may be stored on non-optimal resources, and/or transfers fail
    • Retrieval can be difficult

Instead:

  • Use global scratch to spool large archive files
  • Use HTAR if spool space is an issue
  • If streaming via pipe is unavoidable use PFTP with ALLO64 <bytes> hint:
bash-4.0$ pftp archive <<EOF
> bin
> quote allo64 7706750976
> put "|tar cf - ./joeuser" /home/j/joeuser/joeuser.tar
> quit
> EOF

Massive Pre-Staging

HSI allows pre-fetching data from tape to the HPSS disk cache.  With data pre-fetched into cache hypothetically it should be available quickly for processing.  There are several problems with this approach:

  • The disk cache is shared on a first-come, first-served basis
  • If the cache is under heavy use by other users data may be purged before use
  • If data read to cahe is larger than the cache it will be purged before use
  • Both situations result in a performance penalty as data is read twice from tape

Instead it is recommended that global scratch be used to pre-stage large data volumes instead of the HPSS disk cache.

Large Directories in HPSS

Each HPSS system is backed by a single database instance so

  • Every user interaction causes some database activity
  • One of the most database-intensive commands is HSI long file listing, i.e., "ls -l"
  • Directories containing more than a few thousand files may become difficult to work with interactively

Below is an example of an "hsi ls -l" listing of a directory containing 80k files:

bash-4.0$ time hsi -q 'ls -l /home/n/nickb/tmp/testing/80k-files/' > /dev/null 2>&1

real 20m59.374s
user 0m7.156s
sys 0m7.548s

This graph shows "hsi ls -l" performance as a function of the number of files in the directory:

Long-Running Transfers

  • Can be failure-prone for a variety of reasons including transient network issues, planned/unplanned maintenance, etc.
  • HSI and PFTP do not have capability to resume interrupted transfers
  • Data is not completely migrated to tape from the HPSS disk cache until the transfer is completed
  • It is recommended to keep transfers to 24 hours or less in duration if possible

Session Limits

  • Users are limited to 15 concurrent sessions
  • This number can be temporarily reduced if a user is impacting system usability for others