7.1 Computing Services

CDC operates a centralized computing facility, based on high-end UNIX workstations, that emphasizes shared resources and is designed for the benefit of all CDC science projects. The goal of CDC's systems services is to provide near state-of-the-art computational and storage facilities and to accomplish this within a target budget that is 15% of CDC's total budget. The purpose of CDC's systems services is to allow CDC to efficiently fulfill its mission and research obligations and to enable CDC scientists to compete effectively with their peers at other institutions. Resource allocation and policy issues are considered by an internal CDC review group, the Computer Users Advisory Committee (CUAC), that makes recommendations to CDC's systems and upper management.

The bulk of CDC's computer facility investment is in mid-range computing, utilizing a tightly integrated network of Unix workstations and servers (Figs. 7.1 and 7.2). Total capacity of this system is approximately 18-Gflops of aggregate throughput, with 2.4-Gflops peak symmetric multi-processor (SMP) throughput on each Sun Enterprise 4500 and 525-Mflops peak single-processor throughput on each node of our Compaq DS10 Alpha-cluster. Comparable figures for CDC from four years ago are 1.75-Gflops aggregate, 250-Mflops SMP peak, and 125-Mflops single processor peak, respectively. Total aggregate and SMP throughput have increased by an order of magnitude. Table 7.1 shows the breakout of CDC computing power (in units of aggregate DP LINPACK) by type of processor. Note that one Sun Enterprise 4500 is configured as a large memory machine with 10-GB of RAM.

Schematic overview of CDC's computing services and resources.

Fig. 7.1 Schematic overview of CDC's computing services and resources.

Schematic depiction of CDC's local area network.

Fig. 7.2 Schematic depiction of CDC's local area network.

Table 7.1: CDC Primary Computing Resources
CONFIGURATION CPUs LINPACK
Compaq Alpha DS10 (12 nodes) 12 x 466-MHz 6.30 Gflops
Sun Enterprise 4500 (2 nodes) 16 x 400-MHz 4.80 Gflops
Sun Ultra 60 (6 nodes) 12 x 360-MHz 3.25 Gflops
Sun Enterprise 450 (4 nodes) 16 x 300-MHz 3.60 Gflops
TOTAL 17.95 Gflops

Industry trends are allowing CDC to increasingly provide traditional supercomputing-class services in-house using commodity workstation technologies. In 1999, when a computer-room fire disabled the National Weather Service's CRAY C-90 supercomputer used for operational forecasts, CDC was able to implement a replacement for the week-two ensemble model runs and allowed NWS to continue issuing short-term climate forecasts until their own systems were repaired. CDC accomplished this using six dual-processor Sun Ultra 60 computers dedicated to the task. Lately, several CDC researchers have brought their compute-intensive modeling activities in-house, as external supercomputer facilities have become less cost effective, outmoded, or have switched to massively parallel processor (MPP) technologies that may require significant recoding (e.g., the National Center for Environmental Prediction's IBM RISC-cluster and the Forecast Systems Laboratory's HPTI Alpha-cluster). Since MPP facilities often employ large numbers of workstation-class processors, it is fairly easy to create small clusters of these same processors that come close to the per-processor performance of their larger brethren, at a fraction of the cost. This economy is especially true if the computing can be done as a "loose cluster" which uses traditional Ethernet LAN technology for the inter-processor communications. Ensemble model runs are well suited to loose clusters, since each invocation of the model can be run independently on its own processor. Thus, in the past year CDC has concentrated its modest financial resources on providing this type of high-end computing. For example, for typical ensemble model runs, total throughput on our 12-node Compaq DS10 Alpha-cluster is close to 6 Gflops, with a total acquisition cost of only $60,000. For high-end computing needs that cannot easily be met within CDC, some users continue to make some use of outside facilities, primarily at NCEP, FSL, and NCAR.

Several smaller machines, not included in the general computing category, are dedicated to specific functions to minimize system downtime, segregate competing demands, and to maximize system security. For instance, various servers specialize in electronic mail, anonymous-FTP (file transfer protocol), NIS (network information services), DNS (domain name service), tape backups, and NFS (network file system) mounting of the users' home file systems. Two of the general-purpose Ultra 60 servers also act as fully redundant hosts for the CDC web site.

Total on-line (raw) disk storage capacity is approximately 12.5 TB, of which nearly 6 TB is a Dell storage area network (SAN) device that was acquired as excess property from the 2000 Census, but is still in the process of being brought on-line at CDC. The Dell SAN will be used to host climate model output for ease of model intercomparisons and for efficient generation of model diagnostic statistics. CDC's next largest category of disk storage, at 3.8 TB, is composed of newly acquired Maxtor network attached storage (NAS) devices. Each device is basically a stripped down PC with four large disks in a 1U (1.75" high) rack-mount form factor. These NAS devices can be procured for $3200 for 320 GB or approximately a penny per megabyte. Although they do not provide fast throughput, they are a convenient alternative to magnetic tape. In 1997, total CDC disk capacity was 600 GB of traditional magnetic disks and 400 GB of fairly slow magneto-optical disks. A breakdown of CDC's current magnetic disk storage is shown in Table 7.2.

Table 7.2: CDC Primary Disk Storage
TYPE DISKS CAPACITY
Dell SAN (RAID5) 160 x 36 GB 5760 GB
Maxtor NAS (IDE) 48 x 80 GB 3840 GB
Sun A3500 (RAID5) 56 x 18 GB 1008 GB
Sun A1000 (RAID5) 8 x 18 GB 144 GB
Sun 6-Packs 36 x 18 GB 648 GB
Sun 6-packs 72 x 9 GB 648 GB
Alpha-cluster (IDE)
12 x 40 GB 480 GB
TOTAL 12,528 GB

For large data sets that are of interest to only one or two PI's or for archival purposes, users have access to a variety of tape devices: eight DLT-7000 drives, including two auto-loading stackers, six 8-mm Exabyte drives with one stacker, and a 4-mm DAT drive. An IBM-style 4380 drive and stacker is also available to support the occasional ingest of 1/2" square-cartridge tapes. In addition, systems staff uses two Exabyte Mammoth-2 20-slot tape jukeboxes, dedicated to systems backups, with a total single-pass backup capacity of 2.4 TB. All of the users' home file systems, containing their programming code and executables, are incrementally backed up nightly, while the users' data files are backed up approximately once a month. Since 1997, CDC has discontinued support for 1/2" open-reel tapes and 1/4" cartridge QIC tapes.

Systems users have a variety of printing options through the network, including standard monochrome laser printers, both solid-ink and laser color printers, a large-format color ink-jet poster plotter, and a 65-ppm Xerox digital copier that can also staple the output. Color flatbed scanners with optical character recognition software, a very large digitizing tablet, and a bar-code scanner are available as alternative input devices.

There are three solutions provided for desktop computing, depending on the user's needs. CDC's administrative officer and secretaries use PCs that have specialized software for budgetary, personnel, and procurement record keeping. If scientists or their support staff have a strong predisposition to the MacOS look-and-feel, they are provided with Apple Macintosh computers and Citrix client software to access the other CDC systems. Most users, however, are provided with SunRay network appliances. The SunRay has the look-and-feel of a traditional Sun workstation console and can be set up to support a variety of window environments, such as CDE or FVWM, much like an X-terminal. In reality, the SunRay is a further abstraction of the X-terminal concept. Instead of complex X11 calls being transmitted from a server to the user's desktop client, in the SunRay paradigm, only the frame buffer calls are transmitted from the server to the desktop. The hardware components in a SunRay box are those necessary to interface with user: video out, video in, audio out, audio in, keyboard and mouse. These components have been engineered to the limits of human perception, so, in theory, a SunRay will never need to be upgraded. The SunRay costs $350 and takes only minutes to replace if one should fail. A "smart-card" capability allows users to transfer their active session from one SunRay device to another for the purpose of collaboration or presentation. For all of our users, whether on a PC, Mac, or SunRay, CDC provides high-resolution LCD flat-panel displays to reduce eyestrain, glare, energy consumption, and hazardous emissions.

To supply typical office productivity software, CDC operates two dual-processor 700-MHz Pentium-III PCs, running the Windows 2000 Terminal Server Edition (TSE) operating system. Via Citrix, these servers provide access to Microsoft Word, Excel, PowerPoint, WordPerfect, Quatro Pro, Corel Presentations, Systat, Netscape, and Internet Explorer to our SunRay and Macintosh users. All of the Sun systems are running the Solaris 7 or Solaris 8 Unix variants. These Sun operating systems are scrupulously updated with the latest security patches to provide a high level of system integrity. All unnecessary network services are disabled and an automated security check is routinely performed to detect unauthorized attempts to gain access to the CDC systems. Software provided on the Sun systems includes FORTRAN (77, 90, and 95), C, and C++ compilers, IMSL, NAG, IDL, Matlab, Splus, Maple, NCAR Graphics, and GrADS. Other, publicly licensed software (freeware and shareware) available to users on the Sun systems totals 15 GB. The Compaq Alpha cluster runs the Linux operating system, which is considered a potential security weakness. To minimize the risk, CDC has placed the Alpha cluster behind a Sun Ultra 60 running Solaris, which serves as a gateway to separate the cluster from the rest of the CDC local area network. A user must successfully log into the Solaris box to "see" the Linux boxes. The Alpha cluster provides minimal software for the user - only a single FORTRAN compiler. The Alpha cluster and the high-end Sun systems are limited to batch-only jobs. To access the batch machines the user submits their job to the PBS scheduler, which routes the job to the most appropriate computing resource. Once a job starts running on a given processor, it generally will run to completion without swapping. This technique maximizes total system throughput by reducing the system overhead for each job run.

System downtime at CDC has been fairly minimal in the last few years. Redundancy has been designed into much of the systems infrastructure (e.g., two SunRay servers, two web servers, two Enterprise 4500's, two Pentium servers, and disk arrays using RAID-5). If something breaks, users can usually still get work done. Preventative maintenance and hardware/software upgrades are scheduled and performed by CDC systems staff one evening per month. All critical hardware systems are on maintenance contracts with the manufacturers. CDC also keeps a supply of "hot spares" for many commodity hardware components.

Back | Forward