|
|
|
|
|
|
|
|
Christian H. Bischof, Ron L. Shepard, and Steven
Huss-Lederman, "Workshop Report on Large-Scale Matrix Diagonalization Methods in
Chemistry Theory Institute, held at Argonne National Laboratory, May 20-22, 1996,"
Technical Memorandum ANL/MCS-TM-219,
October 1996. |
This interdisciplinary institute brought together 41
computational chemists and numerical analysts. The goal was to understand the needs of the
computational chemistry community in problems that utilize matrix diagonalization
techniques. This was accomplished by reviewing the current state of the art and looking
toward future directions in matrix diagonalization techniques. This institute occurred
about 20 years after a related meeting of similar size (see Report on the Workshop August
9-11, 1978, University of California, at Santa Cruz, edited by Cleve Moler and I. Shavitt
and sponsored by National Resource for Computation in Chemistry). During those 20 years
the Davidson method continued to dominate the problem of finding a few extremal
eigenvalues for many computational chemistry problems. Work on non-diagonally dominant and
non-Hermitian problems as well as parallel computing has also brought new methods to bear.
The changes and similarities in problems and methods over the past two decades offered an
interesting viewpoint for the success in this area. |
|
|
William Gropp and Ewing Lusk, "Creating a New MPICH
Device Using the Channel Interface," Technical Memorandum ANL/MCS-TM-213
(draft July 1996). |
The MPICH implementation of MPI uses a powerful and efficient
layered approach to simplify porting MPI to new systems. One interface that can be used is
the {\em channel} interface; this interface defines a collection of simple data-transfer
operations. This interface can adapt to additional functionality, such as asynchronous or
nonblocking transfers or remote memory operations. This paper describes this interface,
some of the special issues, and gives instructions on creating a new MPICH implementation
by implementing just a few routines. |
|
|
David Levine, Michael Facello, Philip Hallstrom, Greg Reeder,
Brian Walenz, and Fred Stevens, "STALK Users Guide," Technical Memorandum ANL/MCS-TM-214 (July
1996). |
STALK is a system that models molecular docking between two
proteins. A problem is posed as an optimization problem where the objective is to minimize
the free energy of the molecular system by maximizing the intermolecular interaction
energy between the two molecules. The possible number of conformations between the two
molecules can be very large. A parallel genetic algorithm (GA) is used to explore the
conformation space and identify the low-energy molecular configurations. The DAVE, a
virtual reality environment, can be used to visualize and interact with the system while
it is executing. STALK consists of two programs: stalk.ga the docking program that runs
the GA, and stalk.cave the visualization program> The visualization component is
optional. |
|
|
David Levine, Michael Facello, Philip Hallstrom, Greg Reeder,
Brian Walenz, and Fred Stevens, "STALK Programmers Guide," Technical Memorandum ANL/MCS-TM-215 (July
1996). |
STALK is a system that models molecular docking between two
proteins> A problem is posed as an optimization problem where the objective is to
minimize the intermolecular interaction energy between the two molecules. The possible
number of conformations between the two molecules can be very large. A parallel genetic
algorithm (GA) is used to explore the conformation space and identify the low-energy
molecular configurations. The CAVE, a virtual reality environment, can be used to
visualize and interact with the system while it is executing. STALK consists of two
programs: stalk.ga the docking program that runs the GA, and stalk.dave the visualization
program. The visualization component is optional. |
|
|
Brian P. Walenz, "MolView Users Guide," Technical
Memorandum ANL/MCS-TM-216
(June 1996). |
A system for viewing molecular data in a CAVE virtual reality
environment is presented. The system, called MolView, consists of a front-end driver
program that prepares the data and a backend CAVE program that displays the data. Both are
written so that modifications and extensions are relatively easy to accomplish. |
|
|
|
|
I. Foster, ``Compositional Parallel Programming Languages,''
Preprint MCS-P354-0293
(Revised), Rev. 1, April 1996. To appear in ACM Trans. on Programming Languages and
Systems, 1996. |
In task-parallel programs, diverse activities can take place
concurrently, and communication and synchronization patterns are complex and not easily
predictable. Previous work has identified compositionality as an important design
principle for task-parallel programs. In this paper, we discuss alternative approaches to
the realization of this principle, which holds that properties of program components
should be preserved when those components are composed in parallel with other program
components. We review two programming languages, Strand and Program Composition Notation,
that support compositionality via a small number of simple concepts, namely monotone
operations on shared objects, a uniform addressing mechanism, and parallel composition.
Both languages have been used extensively for large-scale application development,
allowing us to provide an informed assessment of their strengths and weaknesses. We
observe that while compositionality simplifies development of complex applications, the
use of specialized languages hinders reuse of existing code and tools, and the
specification of domain decomposition strategies. This suggests an alternative approach
based on small extensions to existing sequential languages. We conclude the paper with a
discussion of two languages that realize this strategy. |
|
|
Daniel Ralph and Stephen Wright, "Superlinear
Convergence of an Interior-Point Method for Monotone Variational Inequalities,"
Preprint MCS-P556-0196,
January 1996. |
We describe an infeasible-interior-point algorithm for
monotone variational inequality problems and prove that it converges globally and
superlinearly under standard conditions plus a constant rank constraint qualification. The
latter condition represents a generalization of the two types of assumptions made in
existing superlinear analyses; namely, linearity of the constraints and linear
independence of the active constraint gradients. |
|
|
David Levine, "Application of a Hybrid Genetic Algorithm
to Airline Crew Scheduling," Computer and Operations Research (to appear).
Also preprint MCS-P557-0196.
|
This paper discusses the development and application of a
hybrid genetic algorithm to airline crew scheduling problems. The hybrid algorithm
consists of a steady-state genetic algorithm and a local search heuristic. The hybrid
algorithm was tested on a set of forty real-world problems. It found the optimal solution
for half the problems, and good solutions for nine others. The results were compared to
those obtained with branch-and-cut and branch-and-bound algorithms. The branch-and-cut
algorithm was significantly more successful than the hybrid algorithm, and the
branch-and-bound algorithm slightly better. |
|
|
David Levine, William Gropp, Kimmo Forsman, and Lauri
Kettunen, "Parallel Computation of Three-Dimensional Nonlinear Magnetostatic
Problems," Preprint MCS-P558-0196,
February 1996. |
We describe a general-purpose parallel electromagnetic code
for computing accurate solutions to large computationally demanding, 3D, nonlinear
magnetostatic problems. The code, CORAL, is based on a volume integral equation
formulation. Using an IBM SP parallel computer and iterative solution methods, we
successfully solved the dense linear systems inherent in such formulations. A key
component of our work was the use of the PETSc library, which provides parallel
portability and access to the latest linear algebra solution technology. |
|
|
Christian H. Bischof and Gregorio Quintana-Orti',
"Computing Rank-Revealing QR Factorizations of Dense Matrices," ACM Trans.
on Mathematical Software 24, no. 2 (June 1998), pp. 226-253.
Also Preprint MCS-P559-0196,
March 1996. |
We develop algorithms and implementations for computing
rank-revealing QR (RRQR) factorizations of dense matrices. First, we develop an efficient
block algorithm for approximating an RRQR factorization, employing a windowed version of
the commonly used Golub pivoting strategy, aided by incremental condition estimation.
Second, we develop efficiently implementable variants of guaranteed reliable RRQR
algorithms for triangular matrices originally suggested by Chandrasekaran and Ipsen and by
Pan and Tang. We suggest algorithmic improvements with respect to condition estimation,
termination criteria, and Givens updating. By combining the block algorithm with one of
the triangular postprocessing steps, we arrive at an efficient and reliable algorithm for
computing an RRQR factorization of a dense matrix. Experimental results on IBM RS/6000 and
SGI R8000 platforms show that this approach performs up to three times faster than the
less reliable QR factorization with column pivoting as it is currently implemented in
LAPACK, and comes within 15% of the performance of the LAPACK block algorithm for
computing a QR factorization without any column exchanges. Thus, we expect this routine to
be useful in many circumstances where numerical rank deficiency cannot be ruled out but
currently has been ignored because of the computational cost of dealing with it. |
|
|
C. H. Bischof and G. Quintana-Orti', "Codes for
Rank-Revealing QR Factorizations of Dense Matrices," ACM Trans. on Mathematical
Software 24, no. 2 (June 1998), pp. 254-257. Also Preprint MCS-P560-0196, March
1996. |
This article describes a suite of codes as well as associated
testing and timing drivers for computing rank-revealing QR (RRQR) factorizations of dense
matrices. The main contribution is an efficient block algorithm for approximating an RRQR
factorization, employing a windowed version of the commonly used Golub pivoting strategy
and improved versions of the RRQR algorithms for triangular matrices originally suggested
by Chandrasekaran and Ipsen and by Pan and Tang, respectively. We highlight usage and
features of these codes. |
|
|
Stephen J. Wright, "Applying New Optimization Algorithms
to Model Predictive Control," Chemical Process Control-V (to appear). Also
Preprint MCS-P561-0196,
January 1996. |
The connections between optimization and control theory have
been explored by many researchers, and optimization algorithms have been applied with
success to optimal control. The rapid pace of developments in model predictive control has
given rise to a host of new problems to which optimization has yet to be applied.
Concurrently, developments in optimization, and especially in interior-point methods, have
produced a new set of algorithms that may be especially helpful in this context. In this
paper, we reexamine the relatively simple problem of control of linear processes subject
to quadratic objectives and general linear constraints. We show how new algorithms for
quadratic programming can be applied efficiently to this problem. The approach extends to
several more general problems in straightforward ways. |
|
|
Mark T. Jones and Paul E. Plassmann, "Adaptive
Refinement of Unstructured Finite-Element Meshes," Finite Elements in Analysis
and Design 25 (1997), 41-60. Also Preprint MCS-P562-0296,
February 1996. |
The finite element method used in conjunction with adaptive
mesh refinement algorithms can be an efficient tool in many scientific and engineering
applications. In this paper we review algorithms for the adaptive refinement of
unstructured simplicial meshes (triangulations and tetrahedralizations). We discuss bounds
on the quality of the meshes resulting from these refinement algorithms. Unrefinement and
refinement along curved surfaces are also discussed. Finally, we give an overview of
recent developments in parallel refinement algorithms. |
|
|
Gregorio Quintana-Orti and Enrique S. Quintana-Orti,
"Guaranteeing Termination of Chandrasekaran and Ipsen's Algorithm for Computing
Rank-Revealing QR Factorizations," Preprint MCS-P564-0196, May
1996. |
Many problems from science and engineering require the
computation of a rank-revealing QR factorization of a matrix. We have developed a new
algorithm based on Chandrasekaran and Ipsen's algorithm, with two advantages over it:
First, our algorithm is much faster since we modify the main loop to accelerate its
convergence, avoid the useless steps, and use faster subalgorithms. Second, we apply a
technique, suggested by Pan and Tang, that ensures termination, achieves the desired
bounds, and fits into our theoretical studies. |
|
|
J. Fleckinger-Pelle', H. G. Kaper, and P. Takac,
"Dynamics of the Ginzburg-Landau Equations of Superconductivity," Nonlinear
Analysis, Theory, Methods & Applications 32, no. 5 (1998), pp.
647-665. Also Preprint MCS-P565-0296. |
This article is concerned with the dynamical properties of
solutions of the time-dependent Ginzburg-Landau (TDGL) equations of superconductivity. It
is shown that the TDGL equations define a dynamical process when the applied magnetic
field varies with time, and a dynamical system when the applied magnetic field is
stationary. The dynamical system describes the large-time asymptotic behavior: Every
solution of the TDGL equations is attracted to a set of stationary solutions, which are
divergence free. These results are obtained in the "\phi = - \omega (\nabla \cdot
A)" gauge, which reduces to the standard "\phi = -\nabla \cdot A" gauge if
\omega = 1 and to the zero-electric potential gauge if \omega = 0; the treatment captures
both in a unified framework. This gauge forces the London gauge, \nabla \cdot A = 0, for
any stationary solution of the TDGL equations. |
|
|
D. Diachin, L. Freitag, D. Heath, J. Herzog, W. Michels, and
P. Plassmann, "Remote Engineering Tools for the Design of Pollution Control Systems
for Commercial Boilers," Preprint MCS-P566-0296, March
1996. |
We discuss a pilot project involving a collaboration between
Nalco Fuel Tech (NFT), a small company that has developed state-of-the-art emission
reduction systems for commercial boilers, and the computational science group at Argonne
National Laboratory (ANL). The key objective of this project is the development of a
real-time, interactive capability that allows the user to drive the computational model
from within the virtual environment. In this case, the required interaction involves the
placement of chemical injection systems in the boiler and a quick evaluation of their
effectiveness in reducing undesirable emissions from the combustion process. |
|
|
W. Gropp, E. Lusk, N. Doss, and A. Skjellum, ``A
High-Performance, Portable Implementation of the MPI Message Passing Interface Standard,''
Preprint MCS-P567-0296,
July 1996. |
MPI (Message Passing Interface) is a specification for a
standard library for message passing that was defined by the MPI Forum, a broadly based
group of parallel computer vendors, library writers, and applications specialists.
Multiple implementations of MPI have been developed. In this paper, we describe MPICH,
unique among existing implementations in its design goal of combining portability with
high performance. We document its portability and performance and describe the
architecture by which these features are simultaneously achieved. We also discuss the set
of tools that accompany the free distribution of MPICH, which constitute the beginnings of
a portable parallel programming environment. A project of this scope inevitably imparts
lessons about parallel computing, the specification being followed, the current hardware
and software environment for parallel computing, and project management; we describe those
we have learned. Finally, we discuss future developments for MPICH, including those
necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI
Forum. |
|
|
A. Geist, W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk,
W. Saphir, T. Skjellum, and M. Snir, ``MPI-2: Extending the Message-Passing Interface,''
Preprint MCS-P568-0296,
July 1996. |
This paper describes current activities of the MPI-2 Forum.
The MPI-2 Forum is a group of parallel computer vendors, library writers, and application
specialists working together to define a set of extensions to MPI (Message Passing
Interface). MPI was defined by the same process and now has many implementations, both
vendor-proprietary and publicly available, for a wide variety of parallel computing
environments. In this paper we present the salient aspects of the evolving MPI-2 document
as it now stands. We discuss proposed extensions and enhancements to MPI in the areas of
dynamic process management, one-sided operations, collective operations, new language
binding, real-time computing, external interfaces, and miscellaneous topics. |
|
|
R. Thakur, W. Gropp, and E. Lusk, "An Experimental
Evaluation of the Parallel I/O Systems of the IBM SP and Intel Paragon Using a Production
Application," to appear in Proc. 3rd Int'l Conf. of the Austrian Center for
Parallel Computation with special emphasis on Parallel Databases and Parallel I/O,
Sept. 1996. Also Preprint ANL/MCS-P569-0296,
May 1996. |
We present the results of an experimental evaluation of the
parallel I/O systems of the IBM SP and Intel Paragon using a real three-dimensional
parallel application code. This application, developed by scientists at the University of
Chicago, simulates the gravitational collapse of self-gravitating gaseous clouds. It
performs parallel I/O by using library routines that we developed and optimized separately
for the SP and Paragon. The I/O routines perform two-phase I/O and use the parallel file
systems PIOFS on the SP and PFS on the Paragon. We studied the I/O performance for two
different sizes of the application. In the small case, we found that I/O was much faster
on the SP. In the large case, open, close, and read operations were only slightly faster,
and seeks were significantly faster, on the the SP; whereas, writes were slightly faster
on the Paragon. The communication required within our I/O routines was faster on the
Paragon in both cases. The highest read bandwidth obtained was 48 Mbytes/sec., and the
highest write bandwidth obtained was 31.6 Mbytes/sec., both on the SP. |
|
|
C. H. Bischof and M. R. Haghighat, "Hierarchical
Approaches to Automatic Differentiation," Preprint ANL/MCS-P571-0396,
June 1996. |
A mathematical function, specified by a computer program, can
be differentiated efficiently through the exploitation of its program structure. The
important properties of a program for an efficient derivative code are the asymmetries
between the number of inputs and outputs of program components at various levels of
abstraction and the mathematical complexity of the involved operators. Automatic
generation of efficient derivative codes thus requires analysis of programs for detection
of such properties and systematic methods for their exploitation in composing the
derivative codes. We suggest a hierarchical approach based on a partitioning of the
computational or program graph as a means to deduce workable solutions to this hard
problem. Each partition corresponds to a localized scope for derivative computation, and
hierarchical partitions provide a mechanism for exploiting program structure at various
levels. As a particular example, we discuss dynamic programming approaches for finding
good one-dimensional partitions and generalizations to arbitrary directed acyclic graphs
that, by recycling substructure information, allow one to determine the optimal
elimination ordering for a graph with n nodes with complexity O(2**n),
as compared with the O(n!) complexity of a naive search. Lastly, we give a
concrete example illustrating the hierarchical approach on the driven cavity problem from
the MINPACK-2 optimization test set collection. |
C. H. Bischof and P.-T. Wu, ``Exploiting Intermediate
Sparsity in Computing Derivatives for a Leapfrog Scheme,'' Preprint ANL/MCS-P572-0396,
February 1997. |
The leapfrog scheme is a commonly used second-order method
for solving differential equations. Letting Z denote the state of the system, we compute
the state at the next time step as Z(t+1) = H(Z(t),Z(t-1),W), where t denotes a particular
time step, H is the nonlinear timestepping operator, and W are parameters that are not
time dependent. In this article, we show how the associativity of the chain rule of
differential calculus can be used to expose and exploit intermediate derivative sparsity
arising from the typical localized nature of the operator H. We construct a computational
harness that capitalizes on this structure while employing automatic differentiation tools
to automatically generate the derivative code corresponding to the evaluation of one time
step. Experimental results with a 2-D shallow water equation model on IBM RS/6000 and Sun
SPARCstations illustrate these issues. |
|
|
D. Diachin, L. Freitag, D. Heath, J. Herzog, P. Plassmann,
and W. Michels, ``Interactive Simulation and Analysis of Emission Reduction Systems in
Commercial Boilers,'' Preprint ANL/MCS-P573-0396. |
In this paper we describe an interactive virtual environment
developed to model an emission reduction system for commercial boilers. The interactive
environment is used to optimize the performance of the reduction system through the
spatial adjustment and spray reconfiguration of reagent injectors. We describe the three
principal components of the system: a computational model for the particle dynamics, a
three-dimensional display device and graphics environment, and the communication layer
that allows the interaction of the user in the visualization environment with the
computational model. Timing results for each component are given for three hardware
configurations that demonstrate the real-time performance of this tool. |
|
|
T. Gaasterland and B. Dorr, "Selecting Tense, Aspect,
and Connecting Words in Language Generation," Proc. International Joint Conf. on
Artificial Intelligence, Aug. 29-31, 1995, Montreal. Also Preprint ANL/MCS-P574-0296,
April 1996. |
Generating language that reflects the temporal organization
of represented knowledge requires a language generation model that integrates contemporary
theories of tense and aspect, temporal representations, and methods to plan text. This
paper presents a model that produces complex sentences that reflect temporal relations
present in underlying temporal concepts. The main result of this work is the successful
application of constrained linguistic theories of tense and aspect to a generator which
produces meaningful event combinations and selects appropriate connecting words that
relate them. |
|
|
T. Gaasterland and E. Selkov, "Reconstruction of
Metabolic Networks Using Incomplete Information," 3rd Int'l Conf. on Intelligent
Systems for Molecular Biology, Cambridge, England, 7/17-19/95. Also Preprint ANL/MCS-P575-0296. |
This paper describes an approach that uses methods for
automated sequence analysis and multiple databases accessed through an object+attribute
view of the data, together with metabolic pathways, reaction equations, and compounds
parsed into a logical representation from the Enzyme and Metabolic Pathway Database, as
the sources of data for automatically reconstructing a weighted partial metabolic network
for a prokaryotic organism. Additional information can be provided interactively by the
expert user to guide reconstruction. |
|
|
T. Gaasterland and C. Sensen, "Using Multiple Tools for
Automated Genome Interpretation in an Integrated Environment," Preprint ANL/MCS-P576-0296. |
The year 1995 heralded the sequencing of the first whole
microbial genomes. The end of 1996 will see at least four more. The entire yeast genome
will also be submitted to the public databases. At this rate, we can expect to have access
to the complete genomes of at least 25-30 organisms within five years. While most problems
of automated sequence generation and assembly have been solved, the automated analysis of
genomic sequences has scarcely been addressed. Consequently, most genome sequences
submitted to the publicly accessible databases are either poorly or not at all annotated.
Sequence interpretation "by hand" presents an overwhelming challenge to
researchers. To address this situation we are developing a prototype for a UNIX-based
integrated environment for automated genome sequence interpretation. |
|
|
T. A. DeFanti, I. Foster, M. E. Papka, R. Stevens, T.
Kuhfuss, "Overview of the I-WAY: Wide Area Visual Supercomputing," International
Journal of Supercomputing Applications 10/2-3 (1996), pp. 123-131. Also Preprint ANL/MCS-P579-0396. |
This paper discusses the I-WAY project and provides an
overview of the papers in this issue of IJSA. The I-WAY is an experimental environment for
building distributed virtual reality applications and for exploring issues of distributed
wide area resource management and scheduling. The goal of the I-WAY project is to enable
researchers use of multiple inter-networked supercomputers and advanced visualization
systems to conduct very large-scale computations. By connecting a dozen ATM testbeds,
seventeen supercomputer centers, five virtual reality research sites, and over sixty
applications groups, the I-WAY project has created an extremely diverse wide area
environment for exploring advanced applications. This environment has provided a glimpse
of the future for advanced scientific and engineering computing. |
|
|
I. Foster, C. Kesselman, and S. Tuecke, "The Nexus
Task-parallel Runtime System," Proc. 1st Intl. Workshop on Parallel Processing,
Tata McGraw Hill, 1994. Also Preprint ANL/MCS-P581-0396. |
A runtime system provides a parallel language compiler with
an interface to the low-level facilities required to support interaction between
concurrently executing program components. Nexus is a portable runtime system for
task-parallel programming languages. Distinguishing features of Nexus include its support
for multiple threads of control, dynamic processor acquisition, dynamic address space
creation, a global memory model via interprocessor references, and asynchronous events. In
addition, it supports heterogeneity at multiple levels, allowing a single computation to
utilize different programming languages, executables, processors, and network protocols.
Nexus is currently being used as a compiler target for two task-parallel languages:
Fortran M and Compositional C++. In this paper, we present the Nexus design, outline
techniques used to implement Nexus on parallel computers, show how it is used in
compilers, and compare its performance with that of another runtime system. |
|
|
S. K. Park, K. K. Droegemeier, and C. H. Bischof,
"Automatic Differentiation as a Tool for Sensitivity Analysis of a Convective Storm
in a 3-D Cloud Model," Proc. 2nd International Symposium on Computational
Differentiation, Feb. 12-15, 1996, Santa Fe. Also Preprint ANL/MCS-P582-0496. |
The ADIFOR automatic differentiation tool is applied to a 3-D
storm-scale meteorological model to generate a sensitivity-enhanced code capable of
providing derivatives of all model output variables and related diagnostic (derived)
parameters as a function of specified control parameters. The tangent linear
approximation, applied to a deep convective storm by the first of its kind using a
full-physics compressible model, is valid up to 50 min for a 1% water vapor perturbations.
The result is very encouraging considering the highly nonlinear and discontinuous
properties of solutions. The ADIFOR-generated code has provided valuable sensitivity
information on storm dynamics. Especially, it is very efficient and useful for
investigating how a perturbation inserted at earlier time propagates through the model
variables at later times. However, it is computationally very expensive to be applied to
the variational data assimilation, especially for 3-D meteorological models, which have a
large number of input variables. |
|
|
I. Foster, J. Geisler, B. Nickless, W. Smith, S. Tuecke,
"Software Infrastructure for the I-WAY High-Performance Distributed Computing
Experiment," Proc. 5th IEEE Symp. on High Performance Distributed Computing,
pp. 562-571, IEEE Computer Society Press, 1996. Also Preprint ANL/MCS-P583-0396. |
High-speed wide area networks are expected to enable
innovative applications that integrate geographically distributed, high-performance
computing, database, graphics, and networking resources. However, there is as yet little
understanding of the higher-level services required to support these applications, or of
the techniques required to implement these services in a scalable, secure manner. We
report on a large-scale prototyping effort that has yielded some insights into these
issues. Building on the hardware base provided by the I-WAY, a national-scale Asynchronous
Transfer Mode (ATM) network, we developed an integrated management and application
programming system, called I-Soft. This system was deployed at most of the 17 I-WAY sites
and used by many of the 60 applications demonstrated on the I-WAY network. In this
article, we describe the I-Soft design and report on lessons learned from application
experiments. |
|
|
I. Foster, C. Kesselman, and S. Tuecke, "The Nexus
Approach to Integrating Multithreading and Communication," J. of Parallel and
Distributed Computing 37 (1996), pp. 70-82. Also Preprint ANL/MCS-P584-0396. |
Lightweight threads have an important role to play in
parallel systems: they can be used to exploit shared-memory parallelism, to mask
communication and I/O latencies, to implement remote memory access, and to support
task-parallel and irregular applications. In this paper, we address the question of how to
integrate threads and communication in high-performance distributed-memory systems. We
propose an approach based on global pointer and remote service request mechanisms, and
explain how these mechanisms support dynamic communication structures, asynchronous
messaging, dynamic thread creation and destruction, and a global memory model via
interprocessor references. We also explain how these mechanisms can be implemented in
various environments. Our global pointer and remote service request mechanisms have been
incorporated in a runtime system called Nexus that is used as a compiler target for
parallel languages and as a substrate for higher-level communication libraries. We report
the results of performance studies conducted using a Nexus implementation; these results
indicate that Nexus mechanisms can be implemented efficiently on commodity hardware and
software systems. |
|
|
A. K. Sinha, C. H. Bischof, and D. Shiriaev,
"Application of Automatic Differentiation to Reservoir Design Models," Journal
of Water Resources Planning and Management, ASCE, 124(3)
(1998), pp. 162-167. Also Preprint ANL/MCS-P585-0496,
August 1996, revised August 1997. |
Automatic differentiation is a technique for computing
derivatives accurately and efficiently with minimal human effort. Calculation of
derivatives of numerical models is necessary for the optimization of reservoir systems to
determine optimal sizes for reservoirs. For the computational experiments reported in the
present investigation, the ADIFOR (Automatic Differentiation in FORtran) precompiler is
employed for computing derivatives. Given the Fortran 77 source code for the function,
ADIFOR generates code that computes derivatives of the model output with respect to the
input parameters. Strategies to postoptimize the ADIFOR-generated derivative code and
their impact on the computational requirements for evaluating derivatives and model
solution are also discussed. We report on the use of automatic differentiation and divided
difference approaches for computing derivatives for a single- and multiple-reservoir yield
model. The results show that, for both the single- or multiple-reservoir model, automatic
differentiation computes derivatives exactly and more efficiently than the divided
difference implementation. Postoptimization of the generated derivative code results in
computation of derivatives in orders of magnitude less CPU time than with the
ADIFOR-generated derivative code. It is also observed that the availability of exact
derivatives significantly benefits the convergence of optimization algorithm. Thus, the
solution of the multireservoir problem, which took 10.5 hours with divided difference
derivatives, is decreased to less than two hours with ``ADIFOR out of the box''
derivatives, and to less than an hour using the postoptimized ADIFOR derivative code.
|
|
|
C. H. Bischof, B. Lang, and X. Sun, "A Framework for
Symmetric Band Reduction," ACM Trans. on Mathematical Software 26,
No. 4 (Dec. 2000), pp. 581-601. Also Preprint ANL/MCS-P586-0496,
October 1996. |
We develop an algorithmic framework for reducing the
bandwidth of symmetric matrices. This framework includes the reduction of full matrices to
banded or tridiagonal form and the reduction of banded matrices to narrower banded or
tridiagonal form, possibly in multiple steps. Our framework leads to algorithms that
require fewer floating point operations than do standard algorithms. In addition, it
allows for space-time tradeoffs and enables or increases the use of blocked
transformations. |
|
|
C. H. Bischof, B. Lang, and X. Sun, "The SBR Toolbox --
Software for Successive Band Reduction," ACM TOMS 26(4)
(2000), 602-616. Also Preprint ANL/MCS-P587-0496,
October 1996. |
We present a software toolbox for symmetric band reduction,
together with a set of testing and timing drivers. The toolbox contains routines for the
reduction of full symmetric matrices to banded form and the reduction of banded matrices
to narrower banded or tridiagonal form, with optional accumulation of the orthogonal
transformations, as well as repacking routines for storage rearrangements. The
functionality and the calling sequences of the routines are described, with a detailed
discussion of the "control'' parameters that allow adaptation of the codes to particular
machine and matrix characteristics. We also briefly describe the testing and timing
drivers included in the toolbox. |
|
|
H. G. Kaper and P. Takac, "An Equivalence Relation for
the Ginzburg-Landau Equations of Superconductivity," Preprint ANL/MCS-P588-0496,
June 1996; revised 10/96. |
Gauge invariance is used to establish an equivalence relation
between solutions of the stationary and time-dependent Ginzburg-Landau equations
describing the same physical state of a superconductor. The equivalence relation shows how
equilibrium configurations are obtained as large-time asymptotic limits of solutions of
the time-dependent Ginzburg-Landau equations. |
|
|
C. Bischof and A. Carle, "Users' Experience with ADIFOR
2.0," Preprint ANL/MCS-P589-0496,
June 1996. |
In July 1995, the ADIFOR 2.0 system for automatic
differentiation of Fortran was made available to the academic and commercial communities
via the World Wide Web. By January 1996, we had received and processed over one hundred
requests for ADIFOR 2.0. In this paper, we describe some of the experiences of users of
the system that should be interesting to developers of automatic differentiation tools. |
|
|
G. W. Crabtree, G. K. Leaf, H. G. Kaper, D. W. Braun, V. M.
Vinokur, and A. E. Koshelev, "Dynamic Vortex Phases in Superconductors with
Correlated Disorder," Preprint ANL/MCS-P590-0496,
April 1996. |
The nature of driven motion of a vortex solid in the presence
of a planar pinning defect is investigated by large-scale simulations based on the
time-dependent Ginzburg-Landau equations. Three dynamic phases are identified and
characterized by their relative positional and velocity correlations. |
|
|
R. B. Lehoucq, "Restarting an Arnoldi Reduction,"
Preprint ANL/MCS-P591-0496,
May 1996. |
The Arnoldi reduction is an efficient procedure for
approximating a subset of the eigensystem of a large sparse matrix A of order n.
At each step, a partial orthogonal reduction of A into an upper Hessenberg matrix
is produced. The eigenvalues of this Hessenberg matrix are used to approximate a subset of
the eigenvalues of the large matrix A. The approximation to the eigenvalues of A
generally improves as the order of the Hessenberg matrix increases. Unfortunately, so do
the cost and storage of the reduction. A popular alternative is to define an iteration by
restarting the reduction with information in a length m Arnoldi reduction. The
hope is that this restarted reduction has improved estimates to the eigenvalues of A. This
paper considers the various approaches used to restart a reduction. Analysis and numerical
examples are presented that explain and exhibit the generally superior properties of
Sorensen's implicitly restarted Arnoldi iteration. The analysis exploits the fact that an
IRA iteration is mathematically equivalent to a curtailed QR iteration. |
|
|
Rajeev Thakur, William Gropp, and Ewing Lusk, ``An
Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces,'' Proc. 6th
Symp. on the Frontiers of Massively Parallel Computation, October 1996, pp. 180-187, IEEE.
Also Preprint ANL/MCS-P592-0596,
August 1996. |
In this paper, we propose a strategy for implementing
parallel-I/O interfaces portably and efficiently. We have defined an abstract-device
interface for parallel I/O, called ADIO. Any parallel-I/O API can be implemented on
multiple file systems by implementing the API portably on top of ADIO, and implementing
only ADIO on different file systems. This approach simplifies the task of implementing an
API and yet exploits the specific high-performance features of individual file systems. We
have used ADIO to implement the Intel PFS interface and subsets of MPI-IO and IBM PIOFS
interfaces on PFS, PIOFS, Unix, and NFS file systems. Our performance studies indicate
that the overhead of using ADIO as an implementation strategy is very low. |
|
|
I. Foster, D. R. Kohr, Jr., R. Krishnaiyer, and A. Choudhary,
"Double Standards: Bringing Task Parallelism to HPF via the Message Passing
Interface," Preprint ANL/MCS-P593-0596,
October 1996. |
High Performance Fortran (HPF) does not allow efficient
expression of mixed task/data-parallel computations or the coupling of separately compiled
data-parallel modules. In this paper, we show how a coordination library implementing the
Message Passing Interface (MPI) can be used to represent these common parallel program
structures. This library allows data-parallel tasks to exchange distributed data
structures using calls to simple communication functions. We present microbenchmark
results that characterize the performance of this library and that quantify the impact of
optimizations that allow reuse of communication schedules in common situations. In
addition, results from two-dimensional FFT, convolution, and multiblock programs
demonstrate that the HPF/MPI library can provide performance superior to that of pure HPF.
We conclude that this synergistic combination of two parallel programming standards
represents a useful approach to task parallelism in a data-parallel framework, increasing
the range of problems addressable in HPF without requiring complex compiler technology. |
|
|
D. Abramson, I. Foster, J. Giddy, A. Lewis, R. Sosic, R.
Sutherst, and N. White, "The Nimrod Computational Workbench: A Case Study in Desktop
Metacomputing," Preprint ANL/MCS-P594-0596,
October 1996. |
The coordinated use of geographically distributed computers,
or metacomputing, can in principle provide more accessible and cost-effective
supercomputing than conventional high-performance systems. However, we lack evidence that
metacomputing systems can be made easily usable, or that there exist large numbers of
applications able to exploit metacomputing resources. In this paper, we present work that
addresses both these concerns. The basis for this work is a system called Nimrod that
provides a desktop problem-solving environment for parametric experiments. We describe how
Nimrod has been extended to support the scheduling of computational resources located in a
wide-area environment, and report on an experiment in which Nimrod was used to schedule a
large parametric study across the Australian Internet. The experiment provided both new
scientific results and insights into Nimrod capabilities. We relate the results of this
experiment to lessons learned from the I-WAY distributed computing experiment, and draw
conclusions as to how Nimrod and I-WAY--like computing environments should be developed to
support desktop metacomputing. |
|
|
I. Foster, J. Geisler, and S. Tuecke, "MPI on the I-WAY:
A Wide-Area, Multimethod Implementation of the Message Passing Interface," Preprint ANL/MCS-P595-0596,
October 1996. |
High-speed wide-area networks enable innovative applications
that integrate geographically distributed computing, database, graphics, and networking
resources. The Message Passing Interface (MPI) can be used as a portable, high-performance
programming model for such systems. However, the wide-area environment introduces
challenging problems for the MPI implementor, because of the heterogeneity of both the
underlying physical infrastructure and the authentication and software environment at
different sites. In this article, we describe an MPI implementation that incorporates
solutions to these problems. This implementation, which was developed for the I-WAY
distributed-computing experiment, was constructed by layering MPICH on the Nexus
multithreaded runtime system. Nexus provides automatic configuration mechanisms that can
be used to select and configure authentication, process creation, and communication
mechanisms in heterogeneous systems. |
|
|
I. Foster, C. Kesselman, and M. Snir, "Generalized
Communicators in the Message Passing Interface," Preprint ANL/MCS-P596-0596,
October 1996. |
We propose extensions to the Message Passing Interface (MPI)
that generalize the MPI communicator concept to allow multiple communication endpoints per
process, dynamic creation of endpoints, and the transfer of endpoints between processes.
The generalized communicator construct can be used to express a wide range of interesting
communication structures, including collective communication operations involving multiple
threads per process, communications between dynamically created threads, and
object-oriented applications in which communications are directed to specific objects.
Furthermore, this enriched functionality can be provided in a manner that preserves
backward compatibility with MPI. We describe the proposed extensions, illustrate their use
with examples, and discuss implementation issues. |
|
|
I. Foster, D. Kohr, R. Krishnaiyer, and A. Choudhary,
"MPI as a Coordination Layer for Communicating HPF Tasks," Preprint ANL/MCS-P597-0596,
October 1996. |
Data-parallel languages such as High Performance Fortran
(HPF) represent a simple execution model is which a single thread of control performs
high-level operations on distributed arrays. These languages can greatly ease the
development of parallel programs. Yet there are large classes of applications for which a
mixture of task and data parallelism is most appropriate. Such applications can be
structured as collections of data-parallel tasks that communicate by using explicit
message passing. Because he Message Passing Interface (MPI) defines standardized, familiar
mechanisms for this communication model, we propose that HPF tasks communicate by making
calls to a coordination library that provides an HPF binding for MPI. The semantics of a
communication interface for sequential languages can be ambiguous when the interface is
invoked from a parallel language; we show how these ambiguities can be resolved by
describing one possible HPF binding for MPI. We then present the design of a library that
implements this binding, discuss issues that influenced our design decisions, and evaluate
the performance of a prototype HPF/MPI library using a communications microbenchmark and
application kernel. Finally, we discuss how MPI features might be incorporated into our
design framework. |
|
|
W. D. Reynolds, Jr. and R. V. Kenyon, "The Wavelet
Transform and the Suppression Theory of Binocular Vision for Stereo Image
Compression," Preprint ANL/MCS-P598-0596,
July 1996. |
In this paper we present a method for compression of stereo
images. The proposed scheme is a frequency domain approach based on the suppression theory
of binocular vision. By using the information in the frequency domain, complex disparity
estimation techniques can be avoided. The wavelet transform is used to obtain a
multiresolution analysis of the stereo pair by which the subbands convey the necessary
frequency domain information. |
|
|
M. Huang, D. Levine, L. Turner, L. Kettunen, and M. Papka,
"Virtual Reality Visualization of 3-D Electromagnetic Fields," Preprint ANL/MCS-P599-0596,
August 1996. |
One of the major problems in three-dimensional (3-D)
electromagnetic field computation is visualizing the calculated field. Virtual reality
techniques can be used as an aid to this process by providing multiple viewpoints,
allowing immersion within the field, and taking advantage of the human ability to process
3-D spatial information. In this paper we present an example of 3-D electromagnetic field
visualization in the CAVE virtual-reality environment. |
|
|
S. Wright, "Modified Cholesky Factorizations in
Interior-Point Algorithms for Linear Programming," Preprint ANL/MCS-P600-0596,
August. Look here for the dvi version. |
We investigate a modified Cholesky algorithm similar to those
used in current interior-point codes for linear programming. Cholesky-based interior-point
codes are popular for three reasons: their implementation requires only minimal changes to
standard sparse Cholesky codes (allowing us to take full advantage of software written by
specialists in that area); they tend to be more efficient than competing approaches that
use different factorizations; and they perform robustly on most practical problems,
yielding good interior-point steps even when the coefficient matrix is ill conditioned. We
explain the surprisingly good performance of the Cholesky-based approach by using
analytical tools from matrix perturbation theory and error analysis, illustrating our
results with computational experiments. Finally, we point out the limitations of this
approach. |
|
|
I. Foster, J. Geisler, C. Kesselman, and S. Tuecke,
"Multimethod Communications for High-Performance Metacomputing Applications,"
Preprint ANL/MCS-P601-0596,
October 1996. |
Metacomputing systems use high-speed networks to connect
supercomputers, mass storage systems, scientific instruments, and display devices with the
objective of enabling parallel applications to access geographically distributed computing
resources. However, experience shows that high performance often can be achieved only if
applications can integrate diverse communication substrates, transport mechanisms, and
protocols, chosen according to where communication is directed, what is communicated, or
when communication is performed. In this article, we describe a software architecture that
addresses this requirement. This architecture allows multiple communication methods to be
supported transparently in a single application, with either automatic or user-specified
selection criteria guiding the methods used for each communication. We describe an
implementation of this architecture, based on the Nexus communication library, and use
this implementation to evaluate performance issues. The implementation supported a wide
variety of applications in the I-WAY metacomputing experiment at Supercomputing 95; we use
one of these applications to provide a quantitative demonstration of the advantages of
multimethod communication in a heterogeneous networked environment. |
|
|
J. Czyzyk and T. J. Wisniewski, "The Diet Problem: A
WWW-based Interactive Case Study in Linear Programming," Preprint ANL/MCS-P602-0796,
July 1996. |
The Diet Problem is an interactive WWW-based case
study that introduces users (particularly students and practitioners) to the mathematics
of optimization. Users select foods for their menus, edit a set of nutritional
constraints, and solve the linear program simply by clicking some buttons and making
simple entries. A detailed analysis of the diet, complete with graphs and tables, is
returned to the user. The Diet Problem utilizes a simple, yet interesting, linear
program to introduce the concepts of model planning, data gathering, optimal solutions,
and dual variables. The power of the World Wide Web makes the case study accessible to a
wide variety of users in many different countries. |
|
|
D. Levine, M. Facello, P. Hallstrom, G. Reeder, B. Walenz,
and F. Stevens, "STALK: An Interactive Virtual Molecular Docking System,"
Preprint ANL/MCS-P603-0796,
August 1996. |
Several recent technologies--genetic algorithms, parallel and
distributed computing, virtual reality, and high-speed networking--provide the foundation
for a new approach to the computational study of molecular interactions. Parallel genetic
algorithms are an efficient and effective means to explore the large search spaces typical
of these problems, while virtual reality provides an immersive environment for visualizing
the interactions. In this paper we discuss the STALK system, which uses high-speed
networking to couple a parallel genetic algorithm to a virtual reality environment. This
combination allows a local or remote user to interact in real-time with the simulation
through the virtual reality environment. Molecular docking experiments using an IBM SP
parallel computer and a CAVE virtual reality environment are discussed. |
|
|
N. T. Karonis, "An Alternative Method to Remove
Duplicate Tuples Resulting from Operations in the Relational Algebra in a Cube-Connected
Multicomputer System," Preprint ANL/MCS-P604-0796,
August 1996. |
The problem of performing database operations on parallel
architectures has received much attention, both as applied and theoretical areas of
research. Much of the attention has been focused on performing these operations on
distributed-memory architectures, for example, a hypercube. Algorithms that perform, in
particular, relational database operations on a hypercube typically exploit the
hypercube's unique interconnectivity to not only process the relational operators
efficiently but also perform dynamic load balancing. Certain relational operators (e.g.,
projection and union) can produce interim relations that contain duplicate tuples. As a
result, an algorithm for a relational database system must address the issue of removing
duplicate tuples from these interim relations. The algorithms accomplish this by
compacting the relation into hypercubes of smaller and smaller dimensions. We present an
alternative method for removing duplicate tuples from a relation that is distributed over
a hypercube by using the embedded ring found in every hypercube. Through theoretical
analysis of the algorithm and empirical observation, we demonstrate that using the ring to
remove the duplicate tuples is significantly more efficient than using the hypercube. |
|
|
R. B. Lehoucq, "The Computation of Elementary Unitary
Matrices," Preprint ANL/MCS-P605-0796,
August 1996. |
The construction of elementary unitary matrices that
transform a complex vector to a multiple of e_1, the first column of the identity matrix,
are studied. We present four variants and their software implementation, including a
discussion on the LAPACK subroutine CLARFG. Comparisons are also given. |
|
|
C. H. Bischof, "Automatic Differentiation and Numerical
Software Design," Preprint ANL/MCS-P606-0896,
January 1997. |
Automatic differentiation (AD) tools can generate accurate
and efficient derivative code for computer programs of arbitrary length. In some cases,
however, the developer of the code to be differentiated may be required to provide
additional information to an AD tool to ensure the desired solution. We illustrate these
issues with nondifferentiable language intrinsics such as max() in the context of
computing the Euclidean norm and numerical integrators. In both cases, very little
additional information is required to ensure that AD computes the ``do-what-I-mean''
derivatives. In addition, the provision of such information makes it easy to derive
``derivative-enhanced'' versions of these codes. |
|
|
L. A. Freitag and C. Ollivier-Gooch, "A Comparison of
Tetrahedral Mesh Improvement Techniques," Preprint ANL/MCS-P607-0896,
October 1996. |
Automatic meshes generation and adaptive refinement methods
for complex three-dimensional domains have proven to be very successful tools for the
efficient solution of complex applications problems. These methods can, however, produce
poorly shaped elements that cause the numerical solution to be less accurate and more
difficult to compute. Fortunately, the shape of the elements can be improved through
several mechanisms, including face-swapping techniques that change local connectivity and
optimization-based mesh smoothing methods that adjust grid point location. We consider
several criteria for each of these two methods and compare the quality of several meshes
obtained by using different combinations of swapping and smoothing. Computational
experiments show that swapping is critical to the improvement of general mesh quality and
that optimization-based smoothing is highly effective in eliminating very small and very
large angles. The highest quality meshes are obtained by using a combination of swapping
and smoothing techniques. |
|
|
C. Bischof and A. Griewank, "Tools for the Automatic
Differentiation of Computer Programs," Preprint ANL/MCS-P608-0896,
January 1997. |
Automatic differentiation (AD) is a methodology for
developing sensitivity-enhanced versions of arbitrary computer programs. In this paper, we
provide some background information on AD and basic implementation issues for the design
of general purpose tools that can deal with codes from the Fortran and C family, address
some frequently asked questions, and provide pointers for further study. |
|
|
T. Canfield, D. Diachin, L. Freitag, D. Heath, J. Herzog, W.
Michels, "Interactive Computational Models of Particle Dynamics Using Virtual
Reality," Symp. on Virtual Reality in Manufacturing Research and Education, Chicago,
1996. Also Preprint ANL/MCS-P609-0996.
|
We discuss an interactive environment for the visualization,
analysis, and modification of computational models used in industrial settings. In
particular, we focus on interactively placing massless, massed, and evaporating
particulate matter in computational fluid dynamics applications. We discuss the numerical
model used to compute the particle pathlines in the fluid flow for display and analysis.
We briefly describe the toolkits developed for vector and scalar field visualization,
interactive particulate source placement, and a three-dimensional GUI interface. This
system is currently used in two industrial applications, and we present our tools in the
context of these applications. We summarize the current state of the project and offer
directions for future research. |
|
|
S.-J. Wu, Y.-J. J. Wu, and J. J. Shaw, "DNA Solutions to
Searching for the Hamiltonian Cycle and the Traveling-Salesman Cycle," Preprint ANL/MCS-P610-0996,
October 1996. |
Computational problems for which an algorithm cannot be
determined by polynomial time are classified as NP complete problems. Taking advantage of
their great capacity to conduct parallel reactions, DNA molecules and their experimental
protocols have been proposed to solve such problems which otherwise are time consuming for
electronic computers. Based on a working archetype, models are presented here to compute
two NP-complete problems: searching for the Hamiltonian cycle and the traveling-salesman
cycle. |
|
|
S. Carr and R. B. Lehoucq, "Compiler Blockability of
Dense Matrix Factorizations," Preprint ANL/MCS-P611-0996,
October 1996. |
The goal of the LAPACK project is to provide efficient and
portable software for dense numerical linear algebra computations. By recasting many of
the fundamental dense matrix computations in terms of calls to an efficient implementation
of the BLAS (Basic Linear Algebra Subprograms), the LAPACK project has, in large part,
achieved its goal. Unfortunately, the efficient implementation of the BLAS often results
in machine-specific code that is not portable across multiple architectures without a
significant loss in performance or a significant effort to re-optimize them. This paper
examines whether most of the hand optimizations performed on matrix factorization codes
are unnecessary because they can (and should) be performed by the compiler. We believe
that it is better for the programmer to express algorithms in a machine-independent form
and allow the compiler to handle the machine-dependent details. This gives the algorithms
portability across architectures and removes the error-prone, expensive, and tedious
process of hand optimization. Although there currently exist no production compiler that
can perform all the loop transformations discussed in this paper, a description of current
research in compiler technology is provided that will prove beneficial to the numerical
linear algebra community. We show that the Cholesky and LU factorizations may be optimized
automatically by a compiler to be as efficient as the same hand-optimized version found in
LAPACK. We also show that the QR factorization may be optimized by the compiler to perform
comparably with the hand-optimized LAPACK version on modest matrix sizes. Our approach
allows us to conclude that with the advent of the compiler optimizations discussed in this
paper, matrix factorizations may be efficiently implemented in a machine-independent form.
|
|
|
R. B. Lehoucq and K. Meergengen, "The Inexact Rational
Krylov Sequence Method," SIAM J. Matrix Anal. Appl. 20(1)
(1998), pp. 131-148. Journal title: "Using Generalized Cayley Transformations
within an Inexact Rational Krylov Sequence Method," Also Preprint ANL/MCS-P612-1096,
October 1996. |
The rational Krylov sequence (RKS) method is a generalization
of Arnoldi's method. It constructs an orthogonal reduction of a matrix pencil into an
upper Hessenberg pencil. The RKS method is useful when the matrix pencil may be
efficiently factored. However, it requires the solution of a linear system at every step.
This article considers solving the resulting linear systems in an inexact manner by using
an iterative method. We show that a Cayley transformation used within the RKS method is
more efficient and robust than the usual shift-and-invert transformation. A relationship
with the recently introduced Jacobi-Davidson method of Sleijpen and van der Vorst is also
established. |
|
|
I. Foster, N. T. Karonis, C. Kesselman, G. Koenig, S. Tuecke,
"A Secure Communications Infrastructure for High-Performance Distributed
Computing," Proc. Sixth IEEE Int'l Symp. on High Performance Distributed
Comtpuing, Aug. 5-8, 1997, Portland State Univ., Portland, Oregon. Also Preprint ANL/MCS-P613-0996. |
Applications that use high-speed networks to connect
geographically distributed supercomputers, databases, and scientific instruments may
operate over open networks and access valuable resources. Hence, they can require
mechanisms for ensuring integrity and confidentiality of communications and for
authenticating both users and resources. Security solutions developed for traditional
client-server applications do not provide direct support for the program structures,
programming tools, and performance requirements encountered in these applications. We
address these requirements via a security-enhanced version of the Nexus communications
library, which we use to provide secure versions of parallel libraries and languages,
including the Message Passing Interface. These tools permit a fine degree of control over
that, where, and when security mechanisms are applied. In particular, a single application
can mix secure and nonsecure communication, allowing the programmer to make fine-grained
security/performance tradeoffs. We present performance results that quantify the
performance of our infrastructure. |
|
|
I. Foster and C. Kesselman, "Globus: A Metacomputing
Infrastructure Toolkit," Int. J. Supercomput. Appl. 11(2)
(1997), pp. 115-128. Also Preprint ANL/MCS-P614-0996. |
Emerging high-performance applications require the ability to
exploit diverse, geographically distributed resources. These applications use high-speed
networks to integrate supercomputers, large databases, archival storage devices, advanced
visualization devices, and/or scientific instruments to form networked virtual
supercomputers or metacomputers. While the physical infrastructure to build
such systems is becoming widespread, the heterogeneous and dynamic nature of the
metacomputing environment poses new challenges for developers of system software, parallel
tools, and applications. In this article, we introduce Globus, a system that we are
developing to address these challenges. The Globus system is intended to achieve a
vertically integrated treatment of application, middleware, and network. A low-level toolkit
provides basic mechanisms such as communication, authentication, network information, and
data access. These mechanisms are used to construct various higher-level metacomputing services,
such as parallel programming tools and schedulers. Our long-term goal is to build an
Adaptive Wide Area Resource Environment (AWARE), an integrated set of higher-level
services that enable application to adapt to heterogeneous and dynamically changing
metacomputing environments. Preliminary versions of Globus components were deployed
successfully as part of the I-WAY networking experiment. |
|
|
J. Czyzyk, M. P. Mesnier, and J. J. More', "The
Network-Enabled Optimization System (NEOS) Server," Preprint ANL/MCS-P615-1096,
March 1997. |
The Network-Enabled Optimization System (NEOS) is an
environment for solving optimization problems over the Internet. Users submit optimization
problems to the NEOS Server via e-mail, the World Wide Web, or the NEOS Submission Tool.
The NEOS Server locates the appropriate optimization solver, computes all additional
information (for example, derivatives and sparsity patterns) required by the solver, links
the optimization problem with the solver, and returns a solution. This article discusses
the design and implementation of the NEOS Server. |
|
|
W. McCune and A. D. Sands, "Computer and Human
Reasoning: Single Implicative Axioms for Groups and for Abelian Groups," Preprint ANL/MCS-P617-1096,
November 1996. |
The search for single axioms for groups has long interested
mathematicians. In 1938, Tarski presented the following single equational axiom (in terms
of subtraction) for Abelian groups: x-(y-(z-(x-y))) = z. In 1952, Higman and Neumann
presented the following single equational axiom (in terms of division) for ordinary
groups: (x/((((x/x)/y)/z)/(((x/x)/x)/z))) = y. |
|
|
W. McCune, "33 Basic Test Problems: A Practical
Evaluation of Some Paramodulation Strategies," Preprint ANL/MCS-P618-1096,
November 1996. |
Many researchers who study the theoretical aspects of
inference systems believe that if inference rule A is complete and more
restrictive than inference rule B, then the use of A will lead more
quickly to proofs than will the use of B. The literature contains statements of
the sort ``our rule is complete and it heavily prunes the search space; therefore it is
efficient". [This is a very general statement. We are not referring in particular to
the paramodulation strategies on which we focus in this paper.] These positions are highly
questionable and indicate that the authors have little or no experience with the practical
use of automated inference systems. Restrictive rules (1) can block short, easy-to-find
proofs, (2) can block proofs involving simple clauses, the type of clause on which many
practical searches focus, (3) can require weakening of redundancy control such as
subsumption and demodulation, and (4) can require the use of complex checks in deciding
whether such rules should be applied. The only way to determine the practical value of
inference rules and search strategies is to experiment on problems in which long-term
target users are interested. In this paper we present a new theorem prover for equational
logic, a set of 33 equational theorems for evaluating paramodulation strategies, a large
set of experiments with several paramodulation strategies, and two equational proofs in
Robbins algebra. |
|
|
W. McCune and L. Wos, "Otter: The CADE-13 Competition
Incarnations," Preprint ANL/MCS-P619-1096,
November 1996. |
This article discusses the two incarnations of Otter entered
in the CADE-13 Automated Theorem Proving Competition. Also presented are some historical
background, a summary of applications that have led to new results in mathematics and
logic, and a general discussion of Otter. |
|
|
H. G. Kaper and P. Takac, "Ginzburg-Landau Dynamics with
a Time-dependent Magnetic Field," Preprint ANL/MCS-P620-1196,
November 1996. |
The time-dependent Ginzburg-Landau equations of
superconductivity define a dynamical process when the applied magnetic field varies with
time. Sufficient conditions (in terms of the time rate of change of the applied field) are
given that, if satisfied, guarantee that this dynamical process is asymptotically
autonomous. As time goes to infinity, the dynamical process asymptotically approaches a
dynamical system whose attractor coincides with the omega-limit set of the dynamical
process. |
|
|
P. Eberhard and C. Bischof, "Automatic Differentiation
of Numerical Integration Algorithms," Math. Comput. 68
(1999), pp. 717-731. Also Preprint ANL/MCS-P621-1196. |
Automatic differentiation (AD) is a technique for
automatically augmenting computer programs with statements for the computation of
derivatives. This article discusses the application of automatic differentiation to
numerical integration algorithms for ordinary differential equations (ODEs), in
particular, the ramifications of the fact that AD is applied not only to the solution of
such an algorithm, but to the solution procedure itself. This subtle issue can lead to
surprising results when AD tools are applied to variable-stepsize, variable-order ODE
integrators. The computation of the final time step plays a special role in determining
the computed derivatives. We investigate these issues using various integrators and
suggest constructive approaches for obtaining the desired derivatives. |
|
|
D. Ralph and S. J. Wright, "Superlinear Convergence of
an Interior-Point Method Despite Dependent Constraints," Preprint ANL/MCS-P622-1196,
December 1996. |
We show that an interior-point method for monotone
variational inequalities exhibits superlinear convergence on condition that all the
standard assumptions hold except for the well known assumption that the Jacobian of the
active constraints has full rank at the solution. We show that superlinear convergence
occurs even when the constant rank condition on the Jacobian assumed in our earlier paper
does not hold, thereby extending the main result of that paper. |
|
|
L. Wos, "Automating the Search for Elegant Proofs,"
J. of Automated Reasoning 21 (1998), pp. 135-175. Also Preprint ANL/MCS-P623-1196,
June 1997. |
The research reported in this article was spawned by a
colleague's request to find an elegant proof (of a theorem from Boolean algebra) to
replace his proof consisting of 816 deduced steps. The request was met by finding a proof
consisting of 100 deduced steps. The methodology used to obtain the far shorter proof is
presented in detail through a sequence of experiments. Although clearly not an algorithm,
the methodology is sufficiently general to enable its use for seeking elegant proofs
regardless of the domain of study. In addition to (usually) being more elegant, shorter
proofs often provide the needed path to constructing a more efficient circuit, a more
effective algorithm, and the like. The methodology relies heavily on the assistance of
McCune's automated reasoning program OTTER. Of the aspects of proof elegance, the main
focus here is on proof length, with brief attention paid to the type of term present, the
number of variables required, and the complexity of deduced steps. The methodology is
iterative, relying heavily on the use of three strategies: the resonance strategy, the hot
list strategy, and McCune's ratio strategy. These strategies, as well as other features on
which the methodology relies, do exhibit tendencies that can be exploited in the search
for shorter proofs and for other objectives. To provide some insight regarding the value
of the methodology, I discuss its successful application to other problems from Boolean
algebra and to problems from lattice theory. Research suggestions and challenges are also
offered. |
|
|
J. N. Lyness and U. Ruede, "Cubature of Integrands
Containing Derivatives," Preprint ANL/MCS-P625-1196,
November 1996. |
We present a new technique for the numerical integration over
R, a square or triangle, of an integrand of the form $(\nabla u)^T B(\nabla
v)$. This uses only function values of u, B, and v,
avoiding explicit differentiation, but is suitable only when the integrand function is
regular over R. The technique is analogous to Romberg integration, since it is
based on using a sequence of very simple discretizations $J^{(m)}, m = 1,2,3,...$,
of the required integral and applying extrapolation in m to provide closer
approximations. A general approach to the problem of constructing discretizations is
given. We provide specific cost-effective discretizations satisfying familiar, but
somewhat arbitrary guidelines. As in Romberg integration, when each component function in
the integrand is a polynomial, this technique leads to an exact result. |
|
|
C. Bischof, L. Roh, and A. Mauer, "ADIC: An Extensible
Automatic Differentiation Tool for ANSI-C," Software--Practice and Experience
27(12) (Dec. 1997), pp. 1427-1456. Also Preprint ANL/MCS-P626-1196,
March 1997. |
In scientific computing, we often require the derivatives
{partial}f/{partial}x of a function f expressed as a program
with respect to some input parameter(s), x, say. Automatic differentiation (AD)
techniques augment the program with derivative computation by applying the chain rule of
calculus to elementary operations in an automated fashion. Due to the associativity of the
chain rule, there are many ways for accumulating these derivatives. This article
introduces ADIC (Automatic Differentiation of C), a new AD tool for ANSI-C programs. ADIC
is currently the only tool for ANSI-C that employs a source-to-source program
transformation approach; that is, it takes a C code and produces a new C code that
computes the original results as well as thee derivatives. We first present ADIC ``by
example'' to illustrate the functionality and ease of use of ADIC and then describe in
detail the architecture of ADIC. ADIC incorporates a modular design that provides a
foundation for both rapid prototyping of better AD algorithms and their sharing across AD
tools for different languages. A component architecture called AIF (AD Interface Form)
separates core AD concepts from their language-specific implementation and allows the
development of generic AD modules that can be directly reused in other AIF-based AD tools.
The language-specific ADIC front-end and backend canonicalize C programs to make them fir
for semantic augmentation and manage, for example, the association of a program variable
with its associated derivative object. We also report on applications of ADIC to a
semiconductor device simulator, 3-D CFD grid generator, vehicle simulator, and neural
network code. |
|
|
S.-J. Wu, Y.-J. J. Wu, and J. J. Shaw, "A Comprehensive
SNA Arithmetic Calculator," Preprint ANL/MCS-P630-1296,
December 1996. |
Developing a generalized useful DNA computer has been one of
the main interests since the inception of molecular computation. Toward that goal it is
important to equip DNA with the ability to perform mathematical calculations. Here, a
simple procedure that allows DNA-based bit arithmetic calculations is presented.
Oligonucleotides encoding input bit numbers serve as primers for PCR to carry out the
arithmetic operations. Amplified DNA fragments encoding output bit numbers are
subsequently modified to assume the same format as the input. Thus, either iterative or
parallel operations are permissible. |
|
|
C. F. Ollivier-Gooch, "High-Order ENO Schemes for
Unstructured Meshes Based on Least-Squares Reconstruction," Preprint ANL/MCS-P631-1296,
February 1997. |
High-order accurate schemes for conservation laws for
unstructured meshes are not nearly so well advanced as such schemes for structured meshes.
Consequently, little or nothing is known about the possible practical advantages of
high-order discretization on unstructured meshes. This article is part of an ongoing
effort to develop high-order schemes for unstructured meshes to the point where meaningful
information can be obtained about the trade-offs involved in using spatial discretizations
of higher than second-order accuracy on unstructured meshes. |
|
|
S. A. Mitchell and S. A. Vavasis, "Quality Mesh
Generation in Higher Dimensions," Preprint ANL/MCS-P633-1296,
February 1997. |
We consider the problem of triangulating a d-dimensional
region. Our mesh generation algorithm, called QMG, is a quadtree-based algorithm that can
triangulate any polyhedral region including nonconvex regions with holes. Furthermore, our
algorithm guarantees a bounded aspect ratio triangulation provided that the input domain
itself has no sharp angles. Finally, our algorithm is guaranteed never to overrefine the
domain, in the sense that the number of simplices produced by QMG is bounded above by a
factor times the number produced by any competing algorithm, where the factor depends on
the aspect ratio bound satisfied by the competing algorithm. The QMG algorithm has been
implemented in C++ and is used as a mesh generator for the finite element method. |