Skip all navigation and jump to content Jump to site navigation Jump to section navigation.
NASA Logo - Goddard Space Flight Center + Visit NASA.gov
NASA Center for Computational Sciences
NCCS HOME USER SERVICES SYSTEMS DOCUMENTATION NEWS GET MORE HELP

 

Documentation
OVERVIEW
GENERAL SOFTWARE INFO
DISCOVER
PALM/EXPLORE
DIRAC
SUP

dirac (DMF)

DMF (data migration facility) is a hierarchical mass storage system that manages data movement between various levels of storage. The first layer of storage is disk, and the second layer is archive or tape storage. At the NCCS, DMF runs on the SGI Origin 3000 (dirac) system.

+ What is DMF?
+ System overview
+ Changing your dirac password
+ Direct access to dirac
   - ssh to dirac
   - Shell command useful for deleting files
+ Accessing dirac for interactive transfers
   - SFTP example
   - SCP example
+ Accessing dirac via anonymous FTP
+ Setting file permissions using chmod
+ Batch transfers
   - Prerequisites for batch transfers to and from NCCS high-performance computing sytems
   - Creating a new authorized_keys file
   - Adding to an existing authorized_keys file
   - Performing batch transfers on NCCS high-performance computing systems
   - - Batch transfers to and from halem

+ Limiting the number of files in a directory
+ Compressing files to save space
+ Using tar files
+ Preserving file attribute information

+ Tape to disk restoration

+ Planning around scheduled downtimes


What is DMF?

DMF is a hierarchical storage manager (HSM) that supports unlimited file size in a UNIX-like file system environment. Access is via the sftp (secure file transfer protocol) or scp (secure copy) interface, two protocols that incorporate additional encryption to ensure that files are transferred securely.

DMF manages its file system by automatically migrating files from its disk cache to tape when free space is low or files have not been accessed in a certain length of time. If you attempt to retrieve a file that has been migrated to tape, it will automatically be copied to DMF's disk cache before it is copied to your local system. This functionality is very similar to that of the previous SAM-QFS software.

The NCCS is currently running DMF on an SGI Origin 3000, and the host name for the public-facing system is dirac.gsfc.nasa.gov.


System Overview

DMF runs on an SGI Origin 3000:

  • DMF filesystems are actually served by an Origin that does not permit user access.

  • These filesystems are served via CXFS to dirac and palm/explore.

  • Filesystems are exported via NFS to halem.

Disk storage

  • The DMF filesystems each have 2 or 4 TiB disk space. These are on a DataDirect Networks S2A 8500 2-gigabit Fibre Channel disk.

Tape drives

  • The Origin 3000 has directly attached nine 9840A (20 GB/cartridge) and eight 9840C (30 GB/cartridge) tape drives for smaller files and ten 9940B tape drives (200 GB/cartridge) for larger files. In addition, there are another six 9940B tape drives in Building 32 that store a second copy of certain files. They retrieve newly written DMF files not on disk and write/archive copies of disk files to tape.

Network connectivity

  • Two gigabit Ethernet interfaces connect dirac to the NCCS high-performance computers and to Goddard networks.

| Top of Page |


Changing your dirac password

You must enter a 10-digit passcode (this passcode is your 4-digit PIN and the number displayed on your AKT, entered together as one number without spaces) and your existing password to change your password on dirac. Enter the following information to change your password:

ssh mylogin@dirac.gsfc.nasa.gov
ENTER PASSCODE: 10-digit passcode
Password: existing-dirac-password
Last login: Mon Jun 7 18:03:57 2004 from myhost.gsfc.nasa.gov
# passwd
Changing password for
mylogin
Enter existing login password: existing-dirac-password
New Password: new-dirac-password
Re-enter new Password: new-dirac-password
passwd: password successfully changed for mylogin

| Top of Page |


Direct access to dirac

ssh to dirac

To ssh directly to dirac, execute the command

ssh mylogin@dirac.gsfc.nasa.gov

You will be prompted for your passcode and password as in other types of access to dirac.

Shell commands useful for deleting files

The shell commands cd, ls, find, and rm have been enabled to help you delete files; access via sftp and scp should still be used to evaluate file content.

The find command (find starting_directory matching_criteria_and_actions) can be particularly useful for deleting files based on last access, file type, size, owner, or modification time, among other criteria. For example:

To find all files starting from the current directory that have been last accessed a year ago:

% find . -atime +365 -print

To find all files that have a file name ending in .c:

% find . -name "*.c" -print

To find all files bigger than 50K and delete them:

% find . -size +100 -exec rm {} \;

(note that the -size directive uses 512 byte blocks to determine size, thus the 100)

To find all files owned by user guestuser and return a long directory listing:

% find . -user guestuser -ls

To find all files with a modification time less than two days ago:

% find . -mtime -2 -print

For further options, consult the man page for find or a search engine for a string such as "UNIX find command examples."

| Top of Page |


Accessing dirac for interactive transfers

You can access dirac using sftp or scp. There is no need to specify a port if you are accessing dirac from a network outside NCCS systems. If you are accessing dirac from an NCCS high-performance compute server (e.g. palm and discover) for batch jobs, you should specify port 2222.

You should connect to dirac using one of the following simple network paths:

Conditions for Using Command

Command Format

When accessing dirac from NCCS systems

sftp (or) scp dirac

When accessing dirac from non-NCCS systems

sftp (or) scp dirac.gsfc.nasa.gov

You will need your 10-digit passcode (4-digit PIN and the number displayed on your SecurID, entered together as one number without spaces) and dirac password to login.

SFTP example:

sftp mylogin@dirac.gsfc.nasa.gov
Connecting to dirac.gsfc.nasa.gov...
The authenticity of host 'dirac.gsfc.nasa.gov (169.154.162.1xx)' can't be established.
RSA key fingerprint is 35:fc:e4:d1:b9:07:ac:19:b6:8d:db:9e:98:9e:c2:0d.
Are you sure you want to continue connecting (yes/no)?
yes
Warning: Permanently added 'dirac.gsfc.nasa.gov' (RSA) to the list of known hosts.
*************** WARNING - U.S. GOVERNMENT COMPUTER *************

This U.S. Government resource is for authorized use only.

If not authorized to access this resource, disconnect now.
Unauthorized use of, or access to this resource may subject you to disciplinary action or criminal prosecution.

By accessing and using this resource, you are consenting to monitoring, keystroke recording or auditing.
**************************************************************

Enter PASSCODE: 10-digit passcode
Password: dirac-password
sftp>

| Top of Page |

SCP example:

scp mylogin@dirac.gsfc.nasa.gov:full_path_name local_path_name

or

scp local_path_name mylogin@dirac.gsfc.nasa.gov:full_path_name
*************** WARNING - U.S. GOVERNMENT COMPUTER ************
This U.S. Government resource is for authorized use only.

If not authorized to access this resource, disconnect now.
Unauthorized use of, or access to this resource may subject you to disciplinary action or criminal prosecution.

By accessing and using this resource, you are consenting to monitoring, keystroke recording or auditing.
**************************************************************
Enter PASSCODE:
10-digit passcode
Password: dirac-password
path_name 100% |*****************************| size 00:00

 

| Top of Page |


Accessing dirac via anonymous FTP

Read-only anonymous ftp is available on dirac for certain designated directories. Note that only "active" (not passive) sessions are supported, so you may have to modify your ftp client accordingly.


Setting file permissions using chmod

Enter the following command to change permissions on a file via sftp:

sftp dirac
put dirac_file
chmod ### dirac_file

where ### (e.g., 644) is the mode you wish to set for the file. With this approach, you can change the mode of only one file at a time.

| Top of Page |


Batch transfers

Prerequisites for batch transfers to and from NCCS high-performance computing systems

There are several prerequisites for batch sftp and scp transfers to and from NCCS high-performance computing systems (palm and discover):

  • Your dirac home and .ssh directories must not be group- or world-writeable (see Setting file permissions using chmod).

  • You must login to a specific NCCS high-performance computing system using your 10-digit passcode (4-digit PIN and the 6-digit number displayed on your SecurID, entered together as one number without spaces) plus your login password on that system.

  • Your batch sftp or scp to dirac from NCCS high-performance computing systems requires the following:

    • The option -oport=2222 on the sftp or scp command. Note that to use port 2222 you must also originate your sftp or scp from an NCCS high-performance computing system.

    • On dirac, for each NCCS high-performance computing system from which you will be accessing dirac, an entry in your home_directory/.ssh/authorized_keys file (instructions for creating the authorized_keys entry follow). The authorized_keys file is needed because

      • Batch/unattended transfers cannot access or use the changing 6-digit number on your SecurID and

      • NCCS passwords must not be stored in files.

| Top of Page |

Creating a new authorized_keys file on dirac

  1. If it does not yet exist, create the .ssh directory under your home directory on dirac:
    mkdir .ssh

  2. Create your public identity file, id_dsa.pub, on one of the NCCS HEC systems:
    ssh-keygen -t dsa

  3. Copy the file id_dsa.pub into authorized_keys on the system for which you just generated it. If the file authorized_keys already exists on the system, append the contents of id_dsa.pub.

  4. Copy the contents of the authorized_keys file into dirac:
    scp -oport=2222 my_home_dir/.ssh/authorized_keys mylogin@dirac.gsfc.nasa.gov:my_home_dir/.ssh/authorized_keys

    Once the authorized_keys file is in place on both dirac and the NCCS HPC system, you can start using paswordless scp and sftp.

Adding to an existing authorized_keys file

For each NCCS high-performance computing system for which you do not yet have an entry in your dirac authorized_keys file:

  1. Log into that NCCS high-performance computing system and create your public identity file, id_dsa.pub:
    ssh-keygen -t dsa

    The id_dsa.pub file will be in the .ssh directory under your home directory on the NCCS high-performance computing system.

  2. Copy your existing authorized_keys file on dirac into a temporary file:
    scp -oport=2222 mylogin@dirac.gsfc.nasa.gov:.ssh/authorized_keys \ odirac_auth

  3. Concatenate the temporary holder and your id_dsa.pub file into a new file:
    cat odirac_auth id_dsa.pub > dirac_auth

  4. Replace the authorized_keys file on dirac with the contents of the new file:
    scp -oport=2222 dirac_auth \ mylogin@dirac.gsfc.nasa.gov:.ssh/authorized_keys

| Top of Page |

Performing batch transfers on NCCS high-performance computing systems

Batch transfers on different NCCS high-performance computing systems are handled differently by dirac.

Batch transfers to and from halem

Moving files from batch jobs on halem to dirac is essentially a two-step process:

  1. Enable an unattended transfer from halemA to dirac (an interactive node will do). Note that jobs in the datamove queue run on halemA. If you are doing transfers from datamove, then the scp command will work and you can skip the next step.

  2. Enable an unattended transfer on halemA to be initiated from a batch node. There are three steps to performing an unattendedtransfer from a batch node:

    1. Set up your account so that you can do unattended transfers from an interactive node on halem to dirac.

    2. In the process of setting up your account, you will create an id_dsa.pub file on halem. Copy (or append, if the file already exists) the contents of that file to the authorized_keys file in your .ssh directory on halem.

    3. From your batch script, you should now be able to do the following:

      set PWD=`pwd` (for C shell) or PWD=`pwd` (for Bourne shell)

      ssh halem-mss "chdir $PWD ; scp options"

      where options in the scp command is the same syntax you use to scp from an interactive node.

Limiting the number of files in a directory

Ideally, a directory should contain no more than a few hundred entries. This will improve your ability to maintain directory trees and speed up file lookups and retrievals. Placing thousands of files in a directory will present serious performance problems.


Compressing files to save space

One of the easiest ways to conserve space is to compress your files before you store them in DMF. By compressing your files, you save valuable space, use less space on disk and tape, and thus improve the speed at which your files are retrieved the next time you access them. Enter man compress or man gzip at the UNIX command prompt for information about file compression. Compression combined with UNIX tar files generally optimizes NCCS storage capacity.


Using tar files

If you frequently store many small- to medium-sized files in DMF, you should create UNIX tar files before saving them (when you "tar" files, you create a single file that is a collection of other files). For information about the tar command, enter man tar at the UNIX command prompt. For example, if you have a directory with 50 files that are frequently accessed together, retrieving a single tar file holding all 50 files is significantly faster than retrieving each file individually. If stored individually, all 50 files could potentially reside on different tapes, and when each of these 50 files is accessed again, you will have to wait for the multiple tapes to be mounted and the files cached to DMF disk before they are copied to your local disk. If the 50 files are combined into a single tar file that resides on tape, only one tape mount request is required and only one tape is read to retrieve your files.

Combining your files into UNIX tar files also improves the speed of commands that act on individual files. The best example of this is the SFTP dir command. In a directory that contains a large number of files, commands such as dir can take a long time to execute. If you maintain UNIX tar files instead of individual files, the length of your directories decreases, thus improving the speed of the SFTP file operation commands.

| Top of Page |


Preserving file attribute information

For most clients, scp -p should preserve file attribute information.

An easy way to preserve these creation times as well as other crucial file attributes (such as permissions) is to use the UNIX tar command. Using tar combines all your files into a single file that contains the UNIX attributes for each file contained within it. When individual files are extracted from the tar file, they are created with the UNIX attributes with which they were stored.


Tape-to-disk restoration

To retrieve a file from dirac, the file must reside on the DMF disk. If a file resides solely on tape, it will first be copied to the DMF disk and then transferred to the remote host. To make room for files being read from tape, old files in the DMF disk are released. During heavy usage times when many tape mount requests are pending, it could take several minutes to access a file that has been migrated to tape. If the process is taking too long and you want to interrupt it, use your interrupt key (usually CONTROL-C). DMF will continue to copy your file to its disk even after you interrupt the process.


Planning around scheduled downtimes

You may find it useful to plan around scheduled DMF and dirac downtimes. DMF may be unavailable from approximately 0600 to 0900 Eastern time on Wednesday mornings for planned maintenance. Other scheduled downtimes are announced in the NCCS message of the day.

| Top of Page |


FirstGov logo + Privacy Policy and Important Notices
+ Sciences and Exploration Directorate
+ CISTO
NASA Curator: Mason Chang,
NCCS User Services Group (301-286-9120)
NASA Official: Phil Webster, High-Performance
Computing Lead, GSFC Code 606.2