|
GET HELP
Hours of operation:
24 hours 7 days a week
Toll-free:1-800-331-8737
Local:650-604-4444
E-mail:
support@nas.nasa.gov
|
|
|
|
|
STORAGE
The NAS uses three mass storage systems, Lou1, Lou2 and Lou3,
to provide long-term data storage for various users of our high-end
computing systems. Each user should be able to log into any of the Lou
systems, but will only have storage space on one of them. You can
determine which system you should store data on by which "domain"
you compute in.
If you launch jobs from cfe1 and your nobackup filesystem is /nobackup1a-h,
then you should store data on Lou1.
If you launch jobs from cfe2 and your nobackup filesystem is /nobackup2a-g,
then you should store data on Lou2.
If you launch jobs from cfe3 and your nobackup filesystem is /nobackup3a-d,
then you should store data on Lou3.
Your home directory can be referenced as louX:/u/your_userid,
where X is 1, 2, or 3.
The storage systems are SGI Altix systems running the Linux operating system.
The disk space for the three systems combined is about 240 terabytes (TB),
which is split into filesystems ranging from 9-30TB in size.
Data stored on disk is migrated to tapes whenever necessary to make space
for more data. Two copies of your data is written to tape media
in silos located in separate buildings.
Lou1-3 have a combined ten 9840, and 20 T10000 Sun/STK tape drives. Each of
the 9840 drives holds 20 gigabytes (GB) of data, while each of the T10000
tape drives hold 500GB. The total storage capacity is up to 12 petabytes.
Data migration (from disk to tape) and de-migration are managed by the
SGI Data Migration Facility (DMF) and Tape Management Facility (TMF).
Information on the policies and usage guide on storing/transferring
your files is provided below:
Table of Contents:
Transferring Files from Computing Systems to Mass Storage
Your Columbia CXFS nobackup filesystem (ex: /nobackup1a-h, /nobackup2a-g) is mounted on the Mass Storage system
(Lou1-3) that you are assigned to. As a result, you can log into LouX
(where X is 1, 2 or 3) and copy the files from nobackup to your home directory on Lou.
SGI has created a command called cxfscp, which is a tuned version of the cp
command. You can copy files at up to 400MB/s sustained with cxfscp.
If you need to initiate the transfer from Columbia, then we recommend you
use bbscp if the file to be transferred does not need to be encrypted. If you need
to encrypt the data, even within the HEC enclave, then scp should be used.
Transferring files from Pleiades or RTJones to Mass Storage can be done
with bbscp/bbftp or scp. Disk-to-disk copying to Mass Storage may be
implemented in the near future.
Validating Transferred Files using md5sum
It is a good practice to check if files
are copied correctly to the Mass Storage.
A good way to do this is to use sum or md5sum to create
checksums of the files on the source location,
perform the copy, then recompute the checksums
on the destination and compare them to the
checksums of the originals.
For example, suppose you wanted to copy a subdirectory called
project1 from cfe1 to lou. You could follow these steps:
- On cfe1, find all files in the directory hierarchy of project1,
compute the checksums of each file,
and write out the checksums to the file project1.sums
cfe1 % find project1 -type f -print0 | xargs -0 md5sum > project1.sums
Note: ' -type f ' says that you want to find regular files;
' -print0 ' says that you want to always print the exact filename.
-
On cfe1, tar up the directory and copy the tar file to your home directory on
lou1, along with the checksums
cfe1 % tar -czf project1.tgz project1
cfe1 % scp project1.tgz lou1:
cfe1 % scp project1.sums lou1:
Note: ' tar -czf ' says that you want to create a new archive file,
and compress the archive through gzip.
- On lou1, assuming that there is room in /tmp, create a temporary location.
Untar the archive file
stored in your home directory in the temporary location
(assuming you do not want to archive each individual files in /tmp afterwards).
Use md5sum to compute the checksums of each file,
and compare the results
with the checksums in project1.sums.
lou1 % mkdir -p /tmp/your_username
lou1 % cd /tmp/your_username
lou1 % tar -xzf ~/project1.tgz
lou1 % md5sum --check ~/project1.sums
project1/test/README: OK
project1/test/src/test.f: OK
...
-
If everything reports OK, clean out the /tmp directory
lou1 % rm -rf /tmp/your_username/project1
Quota Policy on Disk Space and Files
Some NAS filesystems enforce quotas. Two kinds of quotas are supported:
limits on the total disk space occupied by the user's files, and
limits on how many files the user can store, irrespective of size.
For quota purposes, directories count as files.
Further, there are two different limits: hard limits and soft limits.
Hard limits cannot be exceeded, ever. Any attempt to use more than
your hard limit will be refused with an error. Soft limits,
on the other hand, can be exceeded temporarily. You can stay over your
soft limit for a certain period of time (the grace period). If you
remain over your soft limit for more than the grace period, the soft
limit is enforced as a hard limit.
You will not be able to add or extend files until you get back under
the soft limit. Usually, this means deleting unneeded files or copying
important files elsewhere (perhaps the lou archival storage system)
and then removing them locally.
When you exceed your soft limit you will begin getting daily emails
reminding you how long until the grace period expires. These are
intended to be informative and not a demand to immediately remove
files.
Disk Space Quotas on Columbia /nobackup file systems
You should have a scratch directory in /nobackup1a-h, /nobackup2a-g, or
/nobackup3a-d. This filesystem is part of a Storage Area Network (SAN)
and can be seen on every compute host in your domain, the front end, and
on your mass storage server.
There are also the local nobackup filesystems, /nobackup1-24, which
can only be seen by that compute host.
As the names suggest, these filesystems are not backed up, so any
files that are removed can't be restored. Essential data should
be stored on Lou1-3 or onto other more permanent storage.
-
There is a 200 GB soft limit and 400 GB hard limit on disk space in
each /nobackup disk that you have access to. If you exceed the
soft quota, an email will be sent to inform you of your current disk
space and how much of your grace period remains. It is expected
that a user will exceed their soft limit as needed, however after 14
days, users who are still over their soft limit will have their batch
queue access to Columbia disabled.
-
If an account has been disabled for more than 14 days, then its
Columbia data will be moved to the archive host, lou, and kept there
for 6 months before removal, unless the project lead requests to have
the data moved to another account.
-
If an account no longer has batch access to a node, then all data
from that node should be moved off within 7 days (or sooner if the
other project need the space).
-
If an account needs larger quota limits then send an email
justification to support@nas.nasa.gov. This will be reviewed by the
HECC Deputy Project Manager, Bill Thigpen, for approval.
Disk File Quotas on lou
-
There is no quota for file space on Lou1, Lou2 or Lou3 because the data
is written to tape. There is a quota on the number of files you can have.
Currently there is a soft limit of 250,000 files and a hard limit of
300,000 files.
-
There is a 14 day grace period if soft limit is exceeded. An email
will be sent to inform you of your current disk space and how much
of your grace period remains. It is expected that a user will exceed
their soft limit as needed, however after 14 days, users who are
still over their soft limit will be unable to archive files until
they have reduced their use to below the soft limit.
-
If an account needs larger quota limits then send an email
justification to support@nas.nasa.gov. This will be reviewed by the
HECC Deputy Project Manager, Bill Thigpen, for approval.
-
The maximum size of a file moved to lou should not exceed 30% of
the size of your home filesystem on Lou.
If you need to move files larger than this, please contact the
help desk (support@nas.nasa.gov) for assistance.
Portable File Names and Sizes
Portable File Names
Use portable file names. A name is portable if it
contains only ASCII letters and digits, `.', `_', and `-'.
Do not use spaces or wildcard
characters or start with a `-' or `//', or contain `/-'.
Avoid deep directory nesting.
If you intend to have tar archives to be read under
MSDOS, you should not rely on case distinction for file names, and you might
use the GNU doschk program for helping you further diagnose illegal MSDOS
names, which are even more limited than Unix like operating system.
Portable File Sizes
Even though lou's archive filesystem will allow a file size to be greater than several hundred gigabytes, not all operating systems or filesystems can manage this much data in a single file. If you plan to transfer files to an old Mac or PC desktop you may want to verify the maximum filesize it will support. Likely a single file will need to be less than 4 GB before it will transfer successfully.
Slow File Retrieval
There are sometimes problems with commands on Lou, that
should finish quickly, but end up taking a long time.
When you do an "
ls
"
on Lou, you see all the
files on disk that you've put there.
However, most of the files are actually written to tape using SGI's Data
Migration Facility (DMF).
One problem with DMF is that it does not deal well with
retrieving one file at a time from a long list of files.
If you do an "
scp
"
with a list of files, Unix
feeds those files to DMF one at a time.
This means that the tape(s) containing the files is getting constantly
loaded and unloaded which is bad for the tape and tape drive, and also very
slow. As the list of files gets longer (by use of "*" or moving a
"tree" of files) the problem grows to where it can take hours to
transfer a set of files that would only take a few minutes if they were on
disk. When several people do file transfers at once that retrieve files one at
a time, it can tie the system in knots.
Optimizing File Retrieval
DMF let you fetch files to disk as a group with the
"
dmget
"
command. The tape is read once and gets all the requested files in a single
pass. Essentially, give
dmget
the same list of files you are about to transfer,
and when the
dmget
completes, then
scp/ftp/cp
the files as you had originally
intended. Or you can put the
dmget
in the background and run your
transfer while
dmget
is working. If
any files are already on disk,
dmget
sees this and doesn't try to get
them from tape.
There is also a
dmfind
command that let you walk a file
tree to find offline files to give to
dmget
.
Make very sure you are in the
correct directory before running
dmfind
.
Use the "
pwd
" command to determine your current directory.
Please check to make sure too much data isn't brought back online
at once by using
du
with the --apparent-size option
or by using
/usr/local/bin/dmfdu
.
Note that dmfdu will give an error message for each symbolic link that
points at a nonexistent file.
lou# /usr/local/bin/dmfdu Foo
Foo
13 MB regular 340 files
1114 MB dual-state 1920 files
74633 MB offline 2833 files
13 MB small 340 files
75761 MB total 5093 files
When transferring data between lou and Columbia nodes use
the /nobackup filesystems, instead of the Columbia NFS (slow) home directories.
File transfer rates vary depending on the load on the
system and how many users are transferring files at the same time. Transferring
files using
scp
between Lou and Columbia nodes on the /nobackup file system for files larger
than 100 megabytes is typically between 7 - 17 MB/s using the gigabit
network interface.
Transferring files using
scp
between the Columbia nodes for files larger than 100 megabytes, is
typically between 20-30 MB/s for the gigabit network interface.
Example 1:
lou.user1% dmget *.data &
lou.user1% scp -qp *.data myhost.jpl.nasa.gov:/home/user/wherever
Example 2:
lou.user1% dmfind /u/user1/FY2000 -state OFL -print | dmget &
lou.user1% scp -rqp /u/user1/FY2000 some_host:/nobackup/user1/whereever
You can see the state of a file by doing "
dmls
-l" instead of "
ls
-l".
For more information on using DMF,
please look at:
http://www.nas.nasa.gov/Users/Documentation/DMF-Commands.html
Maximum Amount of Data to Retrieve Online
The online disk space for Lou1-3 is much, much less than its tape storage capacity,
and it is impossible to retrieve all files to online storage at the same time. So,
before retrieving a large amount of data, you should check that there is enough
online space for it. The
df
command shows the amount of free space in a
filesystem. The lou script
dmfdu
reports how much total (online and offline)
data is in a directory. To use this script, simply "
cd
" into the directory you
want to know total amount of data for all the files in the current directory and
execute the script.
If you would like to know the total amount of data under your home directory,
you need to first find out if your account is under s1a-s1e, s2a-s2e or s3a-s3e.
Assuming you are under s1b, you can then use
dmfdu /s1b/your_userid
to find the
total amount. Another alternative is to simply cd to your home directory and
use "
dmfdu *
", which will show use for each file or directory.
Lou1-3's archive filesystems are between 8 TB and 30 TB in size, but the
available space typically floats between 10% to 30%. In the example 3,
29% of space is unused. It is best to retrieve at most 10% of the filesystem
space at a time. Do what you need to with those files (scp, edit, compile, etc),
then release (
dmput -r
) the space, and then retrieve the next group of files,
use them, then release the space, etc. For example 3, retrieve one directory's
data from tape, copy the data to remote host then release the data blocks,
before retrieving more data from tape.
If this process is not done, then it is very likely the filesystem will become full and the retrieval from tapes and file transfers to the remote hosts will fail for everyone trying to use same filesystem.
Example 3:
lou.user1% df -lh .
Filesystem Size Used Avail Use% Mounted on
/dev/lxvm/lsi_s1b 8.6T 6.1T 2.6T 71% /s1b
lou.user1% dmfdu project1 project2
project1
2 MB regular 214 files
13 MB dual-state 1 files
229603 MB offline 101 files
2 MB small 214 files
229606 MB total 315 files
project2
7 MB regular 245 files
4661 MB dual-state 32 files
218999 MB offline 59 files
7 MB small 245 files
223668 MB total 336 files
lou.user1% cd project1
lou.user1% dmfind . -state OFL -print | dmget &
lou.user1% scp -rp /u/user1/project1 remote_host:/nobackup/user1
(Verify that the data has successfully transferred)
lou.user1% dmfind . -state DUL -print | dmput -rw
lou.user1% df -lh .
lou.user1% cd ../project2
lou.user1% dmfind . -state OFL -print | dmget &
lou.user1% scp -rpq /u/user1/project2 remote_host:/nobackp/user1
lou.user1% dmfind . -state DUL -print | dmput -rw
Maximum File Size Policy
Lou's archive filesystems are between 8 TB and 30 TB in size, but small
files (currently, those under 1 MB) consume up to 500 GB of disk space.
The small files are normally always online, therefore reducing the total
amount disk cache by .5 TB.
An excessively large file (greater than 20% of your Lou home filesystem size)
can cause the system to thrash. This is especially true of a tar or cpio file.
If you have a very large file(s) to create or transfer to lou, call or e-mail
the help desk (support@nas.nasa.gov) to have the staff work with you to avoid
causing problems for yourself and other users.
Collecting Small Files into Single Large Files
The DMF archival system is optimized for the storage of
large files (within the limit mentioned above).
The tar and cpio programs allow you to
collect multiple files and directory trees into a single archive file for
storage on lou. You can then
extract any or all files from this collection as needed.
For example, you might create a tar
file of all the sources for a program on a particular date and save that tar
file on lou. At a later date, you
could retrieve the file and extract the sources to rebuild the program as it
existed on that date. As another
example, a program might run through multiple timesteps, producing an image
file at each step. Rather than
store the individual images on lou, you can combine them, plus other files
related to the run, into a single tar file.
Note: If you will
need individual files from a collection on a frequent basis (e.g., daily/weekly), it
is probably better to store the files separately.
Before you can extract a specific file from a collection,
the entire tar archive has to be retrieved into online disk storage.
If you are going to create a tar/cpio archive from a directory, you can't
determine how much data is in the directory by using the
du
command since
any data that is on tape may not be counted.
GNU tar examples
There are multiple versions of tar installed on lou and
the columbia systems. For these
examples, you want to use the version in /usr/local/bin.
(As of this writing, it is GNU tar
1.15.1.) To make sure you get this
version, you can move /usr/local/bin to the front of your PATH:
host.user% set path=(/usr/local/bin $path ) # For csh or tcsh
host.user% PATH=/usr/local/bin:$PATH #For sh, bash, etc.
Running tar on lou
You can run tar on lou as well as on the Columbia systems,
but when creating
tar files you need to use
dmget
to retrieve all the files that will be included
in the archive (See Optimizing File Retrieval).
In the last part of the
example 7, you might not want to transfer the whole archive file to columbia
when all you need is the PBS output file.
In that case, login to lou, and proceed as in example 7, but omit the
scp step. Again, be sure to set
your PATH to use the tar in /usr/local/bin.
Tar Create Collection
Collecting sources for a program into a single tar
archive. Assume you have a
directory on columbia, called "xyz/sources" that contains all the
sources and other files needed to compile program xyz.
You want to create a tar archive of
that directory, create a Table of Contents and store them on lou.
Example 4:
columbia.user% cd xyz
columbia.user% set date=`date +%Y%m%d`
columbia.user% set tarfile=src_$date.tar
# Create the tar file
columbia.user% tar -cf $tarfile sources
# Create a Table of Contents
columbia.user% tar -tf $tarfile > $tarfile.TOC
#Verify the tarfile size matches the directory and the Table of Contents has
#the same number of lines as there is files in the directory.
columbia.user% ls -l $tarfile ; du -sk sources
-rw-r--r-- 1 user1 group1 2.1M Jul 7 14:51 sources.tar
2.1M sources
columbia.user% find sources | wc -l ; wc -l $tarfile.TOC
112
112 sources.tar.TOC
columbia.user% scp $tarfile lou:somewhere/$tarfile
columbia.user% rm $tarfile # Don't need columbia copy any more
The "date" and "tarfile" shell variables are
used to give the tar file a name based
on the output from the date
command. By convention, tar
files end with the suffix ".tar"
If you ran these commands on September
29th, 2005, the file would have the name "src_20050929.tar".
The tar command is run with the -c and -f path options.
-c says to create
a tar archive. -f gives the path
to the archive to create. You'll
always want to use the -f option, because the default archive location is
usually a physical device, such as a tape drive or floppy disk.
Tar Extraction
Now, let's say at some later time you wanted to get this version of the sources back.
Example 5:
columbia.user% mkdir xyztmp
columbia.user% cd xyztmp
columbia.user% scp lou:somewhere/src_20050929.tar .
columbia.user% tar -xf src_20050929.tar
columbia.user% cd sources
columbia.user% make ...
Here, the -x option says to extract items from the archive. Again, -f path is used to specify the tar file.
Note: When we created the tar file, we used a relative path to name the directory we wanted to archive (we used just "sources," rather than the full path "/u/user/xyz/sources"). This way, we can extract the files into a different location, just by starting the extract from a different directory ("xyztmp" in the example). If we were in the original xyz directory, the extract would have replaced the current sources subdirectory with the old one, which is usually not what you want.
Example 6:
Collecting all files related to a particular job into a tar archive. For this example, assume you have a program that, among other output, creates several JPEG image files, at various timesteps during its run. You start each run from a different directory, to keep all the files from each run separate from other runs. However, the program also creates large checkpoint files (with suffix ".chk") that you do not want to include in the archive.
columbia.user% cd pgsimruns
columbia.user% mkdir run52
columbia.user% cd run52
# Copy/create PBS script and program input files
columbia.user% qsub ... run.sh
[Wait for job to complete.]
columbia.user% set files=`ls -1 | egrep -v '\.chk|\.tar' `
columbia.user% tar -cf run52.tar $files
columbia.user% scp run52.tar lou:somewhere/run52.tar
We use ls and egrep to build a list of files in the directory, omitting the checkpoint files. We also omit any .tar files. It is a common mistake to include the tar file itself in the list of files to be collected.
Note: The argument to ls is -1 (one), not -l (ell), to list one file per line.
Tar list
Later, say you want to go back and review how the job ran. You want to examine the PBS job output file. By now, though, you've forgotten the job name. So, you first use tar to produce a table of contents of the archive, then extract just the job output file.
Example 7:
columbia.user% scp lou:somewhere/run52.tar /tmp/run52.tar
columbia.user% tar -tvf /tmp/run52.tar
-rwxr-xr-x user/group 35 2005-09-30 13:53:18 run.sh
-rw-r--r-- user/group 20193 2005-09-30 14:10:13 pgsim.in
-rw-r--r-- user/group 409600 2005-09-30 14:12:46 step0.jpg
-rw-r--r-- user/group 409600 2005-09-30 14:12:52 step10.jpg
-rw-r--r-- user/group 409600 2005-09-30 14:12:58 step20.jpg
...
-rw-r--r-- user/group 409600 2005-09-30 14:14:17 step150.jpg
-rw-r--r-- user/group 5738 2005-09-30 14:14:23 pgsim.out
-rw------- user/group 5372 2005-09-30 14:14:26 run.sh.o8880
-rw------- user/group 0 2005-09-30 14:14:26 run.sh.e8880
columbia.user% tar -xf /tmp/run52.tar run.sh.o8880
The first tar uses the -t option to indicate that you want a table of contents, and -v to get a verbose listing.
CPIO Examples
CPIO is another tool like tar used to create collections of files into a single file. The newer GNU versions of cpio and tar
have been updated to be able to read both (cpio and tar) formats. In the past the primary portable archive file format was cpio.
CPIO create
Cpio is another program that can collect sets of files into a single archive file. Its major advantage over tar is that you can specify the files to be archived on stdin, rather than on the command line. A significant difference from tar is that if tar is asked to archive a directory, it archives the directory contents also. Cpio must be told explicitly to archive each item in a directory.
Example 8:
Let's take example 4 above and use cpio instead of tar to archive a directory of program source files and create a Table of Contents.
columbia.user% cd xyz
columbia.user% set date=`date +%Y%m%d`
columbia.user% set cpiofile=src_$date.cpio
columbia.user% find source -print | cpio -o -c > $cpiofile
columbia.user% cat $cpiofile | cpio -it > $cpiofile.TOC
#Verify the cpiofile size matches the directory and the Table of Contents
#has the same number of lines as there is files in the directory.
columbia.user% ls -l $cpiofile ; du -sk sources
-rw-r--r-- 1 user1 group1 2.1M Jul 7 14:51 sources.cpio
2.1M sources
columbia.user% find sources | wc -l ; wc -l $cpiofile.TOC
112
112 sources.cpio.TOC
columbia.user% scp $cpiofile lou:somewhere/$cpiofile
columbia.user% rm $cpiofile
We use the find command to generate the list of all files in the source directory. The -o option to cpio says to output an archive, and the -c option says to create it in a more compatible format.
CPIO extraction
The equivalent steps to restore the contents of the cpio archive are:
Example 9:
columbia.user% mkdir xyztmp
columbia.user% cd xyztmp
columbia.user% scp lou:somewhere/src_20050929.cpio .
columbia.user% cpio -ic < src_20050929.cpio
The -i option says to input the archive (usually from stdin, as in this example). As before, -c says the archive is in the more compatible format.
Example 7: Converting Example 5 to use cpio is similar. The goal is to archive most, but not all, files in a directory. Note that we don't need example 5's $file variable because we can pipe the ls/egrep output directly into cpio.
columbia.user% [ Same as example 5 through waiting for job to complete ]
columbia.user% ls -1 | egrep -v '\.chk|\.cpio' | cpio -oc > run52.cpio
columbia.user% scp run52.cpio lou:somewhere/run52.cpio
Note: As in example 5, the argument to ls is -1 (one), not l (ell).
Cpio is given the -o and -c arguments, to output the archive in compatible format.
CPIO list
The cpio method for listing the table of contents of an archive and extracting a single file is:
Example 10:
columbia.user% scp lou:somewhere/run52.cpio /tmp/run52.cpio
columbia.user% cpio -ictv < /tmp/run52.cpio
-rwxr-xr-x 1 user1 grp1 35 Sep 30 13:53 run.sh
-rw-r--r-- 1 user1 grp1 20193 Sep 30 14:10 pgsim.in
-rw-r--r-- 1 user1 grp1 409600 Sep 30 14:12 step0.jpg
-rw-r--r-- 1 user1 grp1 409600 Sep 30 14:12 step10.jpg
-rw-r--r-- 1 user1 grp1 409600 Sep 30 14:12 step20.jpg
-rw-r--r-- 1 user1 grp1 409600 Sep 30 14:14 step150.jpg
-rw-r--r-- 1 user1 grp1 5738 Sep 30 14:14 pgsim.out
-rw------- 1 user1 grp1 5372 Sep 30 14:14 run.sh.o8880
-rw------- 1 user1 grp1 0 Sep 30 14:14 run.sh.e8880
25767 blocks
columbia.user% cpio -icm run.sh.o8880 < /tmp/run52.cpio
In the first cpio, the -i and -t arguments, combined, say we want to list a table of contents. The -c option, again, enables compatibility mode, although cpio can usually automatically detect which format was used in creating an archive. Verbose output is selected with -v.
In the second cpio, the -m flag says to restore the modification times of extracted files to their values when they were archived. Otherwise, the modification times would be the time when cpio performs the extract.
Note: When extracting files from an archive, cpio will not overwrite existing files by the same name. Add the -u option if you want to replace such files.
A nice feature of cpio is that the list of files to extract is really a list of file name patterns. So, in the last example, because we know the format of PBS output files, we could skip listing the table of contents and let cpio find the right file for us:
columbia.user% cpio icm 'run.sh.o*' < /tmp/run52.cpio
Resend: File Copy Retry on Remote System Outages or Interrupts
Using scp to transfer lots of files or over 50 GB of data could likely take over an hour to copy the data to Lou from a Columbia node. After starting the transfer, and seeing it is working, you might walk away for a while, but after several minutes into the file transfer there was a problem (Lou crashed, network problem, filesystem full, etc) that caused a file transfer failure and the rest of the files would likely not be transferred either. Normally it takes less than 20 minutes to reboot the host and other problems are typically resolved in less than hour.
If possible use tar or cpio to gather the files into a single file to transfer to lou. If this is not possible then use the resend tool.
There has been a new tool developed to help with transferring lots of files or large amount of data to Lou. It is called resend and is located in /usr/local/bin on Lou and all Columbia nodes.
The purpose of resend is to send files and directories to a remote node and on failure of a file try to resend the file up to -r times and waiting -t minutes on first failure then doubling the wait time on each additional failure for a file.
Default values are: -r 10 repeats, -t 3 minutes and the max wait time (-M) is 60 minutes between retries.
Use resend interactively or from a small batch job (4p) to make effective use of the project CPU allocation. This tool is used to drop in front of scp to transfer files.
Example 11:
columbia.nas.nasa.gov % ls -l
total 20889816
drwx--x---+ 2 user1 grp1 14 Aug 19 14:53 acl
drwx------ 2 user1 grp1 4096 Sep 26 13:31 junktar
columbia.nas.nasa.gov % resend scp junktar lou:.
Success: mkdir -p junktar
Success: scp -q junktar/f2m.1 lou:.
Success: scp -q junktar/f2m.2 lou:.
Success: scp -q junktar/f2m.3 lou:.
Success: scp -q junktar/f2m.4 lou:.
Success: scp -q junktar/f2m.5 lou:.
Success: scp -q junktar/f2m.6 lou:.
Success: scp -q junktar/f2m.7 lou:.
scp: junktar/f2m.8: Permission denied
Failed: Resending: scp -q junktar/f2m.8 lou:.
{ resend automatically waiting three minutes and trying again }
Success: scp -q junktar/f2m.8 lou:.
Success: scp -q junktar/f2m.9 lou:.
+ ... { rest of the files }
|
|
|
|