dirac (DMF)
DMF (data migration facility) is a hierarchical
mass storage system that manages
data movement between various levels
of storage. The first layer of
storage is disk, and the second
layer is archive or tape storage.
At the NCCS, DMF runs on the SGI
Origin 3000 (dirac) system.
What is
DMF?
DMF is a hierarchical storage
manager (HSM) that supports unlimited
file size in a UNIX-like file system
environment. Access is via the
sftp (secure file transfer protocol)
or scp (secure copy) interface,
two protocols that incorporate
additional encryption to ensure
that files are transferred securely.
DMF manages its file system
by automatically migrating files
from its disk cache to tape when
free space is low or files have
not been accessed in a certain
length of time. If you attempt
to retrieve a file that has been
migrated to tape, it will automatically
be copied to DMF's disk cache
before it is copied to your local
system. This functionality is very
similar to that of the previous
SAM-QFS
software.
The NCCS is currently running
DMF on an SGI Origin 3000, and
the host name for the public-facing
system is dirac.gsfc.nasa.gov.
System
Overview
DMF runs on an SGI Origin 3000:
Disk storage
Tape drives
-
The Origin 3000 has directly
attached nine 9840A (20 GB/cartridge)
and eight 9840C (30 GB/cartridge)
tape drives for smaller files
and ten 9940B tape drives
(200 GB/cartridge) for larger
files. In addition, there are
another six 9940B tape drives
in Building 32 that store a
second copy of certain files.
They retrieve newly written
DMF files not on disk and write/archive
copies of disk files to tape.
Network connectivity
| Top
of Page |
Changing
your dirac password
You must enter a 10-digit passcode
(this passcode is your 4-digit
PIN and the number displayed on
your AKT, entered together as one
number without spaces) and your
existing password to change your
password on dirac. Enter the following
information to change your password:
ssh mylogin@dirac.gsfc.nasa.gov
ENTER
PASSCODE: 10-digit
passcode
Password: existing-dirac-password
Last
login: Mon Jun 7 18:03:57 2004
from myhost.gsfc.nasa.gov
# passwd
Changing
password for mylogin
Enter existing login password: existing-dirac-password
New Password: new-dirac-password
Re-enter new Password: new-dirac-password
passwd: password successfully
changed for mylogin
| Top
of Page |
Direct
access to dirac
ssh
to dirac
To ssh directly to dirac,
execute the command
ssh mylogin@dirac.gsfc.nasa.gov
You will be prompted for your
passcode and password as in other
types of access to dirac.
Shell commands useful for deleting
files
The shell commands cd, ls, find,
and rm have been enabled to help
you delete files; access via sftp
and scp should still be used to
evaluate file content.
The find command (find starting_directory
matching_criteria_and_actions)
can be particularly useful for
deleting files based on last
access, file type, size, owner,
or modification time, among other
criteria. For example:
To find all files starting from
the current directory that have
been last accessed a year ago:
% find . -atime
+365 -print
To find all files that have a
file name ending in .c:
% find . -name "*.c" -print
To find all files bigger than
50K and delete them:
% find . -size
+100 -exec rm {} \;
(note that the -size directive
uses 512 byte blocks to determine
size, thus the 100)
To find all files owned by user
guestuser and return a long directory
listing:
% find . -user
guestuser -ls
To find all files with a modification
time less than two days ago:
% find . -mtime
-2 -print
For further options, consult
the man page for find or a search
engine for a string such as "UNIX
find command examples."
| Top
of Page |
Accessing
dirac for interactive transfers
You can access dirac
using sftp or scp. There is no
need to specify a port if you are
accessing dirac from a network
outside NCCS systems. If you are
accessing dirac from an NCCS high-performance
compute server (e.g. palm and discover)
for batch jobs, you should
specify port 2222.
You should connect to dirac using
one of the following simple network
paths:
Conditions for
Using Command |
Command Format |
When accessing
dirac from NCCS systems |
sftp (or) scp
dirac |
When accessing
dirac from non-NCCS systems |
sftp (or) scp dirac.gsfc.nasa.gov |
You will need your 10-digit passcode
(4-digit PIN and the number displayed
on your SecurID, entered together
as one number without spaces) and
dirac password to login.
SFTP example:
sftp mylogin@dirac.gsfc.nasa.gov
Connecting
to dirac.gsfc.nasa.gov...
The authenticity of host 'dirac.gsfc.nasa.gov (169.154.162.1xx)'
can't be established.
RSA key fingerprint is 35:fc:e4:d1:b9:07:ac:19:b6:8d:db:9e:98:9e:c2:0d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added
'dirac.gsfc.nasa.gov' (RSA) to the list of known
hosts.
*************** WARNING - U.S. GOVERNMENT COMPUTER
*************
This U.S. Government
resource is for authorized use only.
If
not authorized to access this
resource, disconnect now.
Unauthorized use of, or access to this resource may
subject you to disciplinary action or criminal prosecution.
By
accessing and using this resource,
you are consenting to monitoring,
keystroke recording or auditing.
**************************************************************
Enter PASSCODE: 10-digit
passcode
Password: dirac-password
sftp>
| Top
of Page |
SCP example:
scp mylogin@dirac.gsfc.nasa.gov:full_path_name
local_path_name
or
scp local_path_name mylogin@dirac.gsfc.nasa.gov:full_path_name
***************
WARNING - U.S. GOVERNMENT COMPUTER
************
This U.S. Government resource is for authorized use
only.
If
not authorized to access this
resource, disconnect now.
Unauthorized use of, or access to this resource may
subject you to disciplinary action or criminal prosecution.
By
accessing and using this resource,
you are consenting to monitoring,
keystroke recording or auditing.
**************************************************************
Enter PASSCODE: 10-digit
passcode
Password: dirac-password
path_name 100%
|*****************************| size 00:00
| Top
of Page |
Accessing
dirac via anonymous FTP
Read-only anonymous ftp is available
on dirac for certain designated
directories. Note that only "active" (not
passive) sessions are supported,
so you may have to modify your
ftp client accordingly.
Setting
file permissions using chmod
Enter the following command
to change permissions on a file
via sftp:
sftp
dirac
put dirac_file
chmod ###
dirac_file
where ### (e.g.,
644) is the mode you wish to set
for the file. With this approach,
you can change the mode of only
one file at a time.
| Top
of Page |
Batch
transfers
Prerequisites
for batch transfers to and from
NCCS high-performance computing
systems
There are several prerequisites
for batch sftp and scp transfers
to and from NCCS high-performance
computing systems (palm and discover):
-
Your dirac home and .ssh directories
must not be group- or world-writeable
(see Setting
file permissions using chmod).
-
You must login to a specific
NCCS high-performance computing
system using your 10-digit
passcode (4-digit PIN and the
6-digit number displayed on
your SecurID, entered together
as one number without spaces)
plus your login password on
that system.
-
Your batch sftp or scp to
dirac from NCCS high-performance
computing systems requires
the following:
-
The option -oport=2222 on
the sftp or scp command.
Note that to use port 2222
you must also originate
your sftp or scp from an
NCCS high-performance computing
system.
-
On dirac, for each NCCS
high-performance computing
system from which you will
be accessing dirac, an
entry in your home_directory/.ssh/authorized_keys file
(instructions for creating
the authorized_keys entry
follow). The authorized_keys file
is needed because
| Top
of Page |
Creating
a new authorized_keys file
on dirac
-
If it does not yet exist,
create the .ssh directory
under your home directory on
dirac:
mkdir
.ssh
-
Create your public identity
file, id_dsa.pub,
on one of the NCCS HEC systems:
ssh-keygen -t
dsa
-
Copy the file id_dsa.pub into authorized_keys on
the system for which you just
generated it. If the file authorized_keys already
exists on the system, append
the contents of id_dsa.pub.
-
Copy the contents of the authorized_keys file
into dirac:
scp
-oport=2222 my_home_dir/.ssh/authorized_keys mylogin@dirac.gsfc.nasa.gov:my_home_dir/.ssh/authorized_keys
Once the authorized_keys file
is in place on both dirac and the NCCS HPC system,
you can start using paswordless scp and sftp.
Adding
to an existing authorized_keys file
For each NCCS high-performance
computing system for which you
do not yet have an entry in your
dirac authorized_keys file:
-
Log into that NCCS high-performance
computing system and create
your public identity file, id_dsa.pub:
ssh-keygen
-t dsa
The id_dsa.pub file
will be in the .ssh directory
under your home directory on the NCCS high-performance
computing system.
-
Copy your existing authorized_keys file
on dirac into a temporary file:
scp
-oport=2222 mylogin@dirac.gsfc.nasa.gov:.ssh/authorized_keys
\ odirac_auth
-
Concatenate the temporary
holder and your id_dsa.pub file
into a new file:
cat
odirac_auth id_dsa.pub > dirac_auth
-
Replace the authorized_keys file
on dirac with the contents
of the new file:
scp
-oport=2222 dirac_auth
\ mylogin@dirac.gsfc.nasa.gov:.ssh/authorized_keys
| Top
of Page |
Performing
batch transfers on NCCS high-performance
computing systems
Batch transfers on different
NCCS high-performance computing
systems are handled differently
by dirac.
Batch
transfers to and from halem
Moving files from batch jobs on
halem to dirac is essentially a
two-step process:
-
Enable
an unattended transfer from
halemA to dirac (an
interactive node will do).
Note that jobs in the datamove
queue run on halemA. If you
are doing transfers from
datamove, then the scp command
will work and you can skip
the next step.
-
Enable
an unattended transfer on
halemA to be initiated from
a batch node. There
are three steps to performing
an unattendedtransfer from
a batch node:
-
Set up your account so
that you can do unattended
transfers from an interactive
node on halem to dirac.
-
In the process of setting
up your account, you will
create an id_dsa.pub file
on halem. Copy (or append,
if the file already exists)
the contents of that file
to the authorized_keys file
in your .ssh directory
on halem.
-
From your batch script,
you should now be able
to do the following:
set
PWD=`pwd`
(for C shell)
or PWD=`pwd` (for
Bourne shell)
ssh
halem-mss "chdir
$PWD ; scp options"
where options in
the scp command
is the same syntax you use to scp from an
interactive node.
Limiting
the number of files in a directory
Ideally, a directory should contain
no more than a few hundred entries.
This will improve your ability
to maintain directory trees and
speed up file lookups and retrievals.
Placing thousands of files in a
directory will present serious
performance problems.
Compressing
files to save space
One of the easiest ways to conserve
space is to compress your files
before you store them in DMF.
By compressing your files, you
save valuable space, use less space
on disk and tape, and thus improve
the speed at which your files are
retrieved the next time you access
them. Enter man
compress or man
gzip at the UNIX command
prompt for information about file
compression. Compression combined
with UNIX tar files generally optimizes
NCCS storage capacity.
Using tar files
If you frequently store many small-
to medium-sized files in DMF,
you should create UNIX tar files
before saving them (when you "tar" files,
you create a single file that is
a collection of other files). For
information about the tar command,
enter man
tar at the UNIX command
prompt. For example, if you have
a directory with 50 files that
are frequently accessed together,
retrieving a single tar file holding
all 50 files is significantly faster
than retrieving each file individually.
If stored individually, all 50
files could potentially reside
on different tapes, and when each
of these 50 files is accessed again,
you will have to wait for the multiple
tapes to be mounted and the files
cached to DMF disk before they
are copied to your local disk.
If the 50 files are combined into
a single tar file that resides
on tape, only one tape mount request
is required and only one tape is
read to retrieve your files.
Combining your files into UNIX
tar files also improves the speed
of commands that act on individual
files. The best example of this
is the SFTP dir command.
In a directory that contains a
large number of files, commands
such as dir can
take a long time to execute. If
you maintain UNIX tar files instead
of individual files, the length
of your directories decreases,
thus improving the speed of the
SFTP file operation commands.
| Top
of Page |
Preserving
file attribute information
For most clients, scp -p should
preserve file attribute information.
An easy way to preserve these
creation times as well as other
crucial file attributes (such as
permissions) is to use the UNIX tar command.
Using tar combines
all your files into a single file
that contains the UNIX attributes
for each file contained within
it. When individual files are extracted
from the tar file, they are created
with the UNIX attributes with which
they were stored.
Tape-to-disk
restoration
To retrieve a file from dirac,
the file must reside on the DMF
disk. If a file resides solely
on tape, it will first be copied
to the DMF disk and then transferred
to the remote host. To make room
for files being read from tape,
old files in the DMF disk
are released. During heavy usage
times when many tape mount requests
are pending, it could take several
minutes to access a file that has
been migrated to tape. If the process
is taking too long and you want
to interrupt it, use your interrupt
key (usually CONTROL-C). DMF
will continue to copy your file
to its disk even after you interrupt
the process.
Planning
around scheduled downtimes
You may find it useful to plan
around scheduled DMF and dirac
downtimes. DMF may be unavailable
from approximately 0600 to 0900
Eastern time on Wednesday mornings
for planned maintenance. Other
scheduled downtimes are announced
in the NCCS message of the day.
| Top
of Page | |