1996 Abstracts of MCS Reports and Preprints

	1996 Abstracts of MCS Reports and Preprints

Reports

Christian H. Bischof, Ron L. Shepard, and Steven Huss-Lederman, "Workshop Report on Large-Scale Matrix Diagonalization Methods in Chemistry Theory Institute, held at Argonne National Laboratory, May 20-22, 1996," Technical Memorandum ANL/MCS-TM-219, October 1996.	This interdisciplinary institute brought together 41 computational chemists and numerical analysts. The goal was to understand the needs of the computational chemistry community in problems that utilize matrix diagonalization techniques. This was accomplished by reviewing the current state of the art and looking toward future directions in matrix diagonalization techniques. This institute occurred about 20 years after a related meeting of similar size (see Report on the Workshop August 9-11, 1978, University of California, at Santa Cruz, edited by Cleve Moler and I. Shavitt and sponsored by National Resource for Computation in Chemistry). During those 20 years the Davidson method continued to dominate the problem of finding a few extremal eigenvalues for many computational chemistry problems. Work on non-diagonally dominant and non-Hermitian problems as well as parallel computing has also brought new methods to bear. The changes and similarities in problems and methods over the past two decades offered an interesting viewpoint for the success in this area.

William Gropp and Ewing Lusk, "Creating a New MPICH Device Using the Channel Interface," Technical Memorandum ANL/MCS-TM-213 (draft July 1996).	The MPICH implementation of MPI uses a powerful and efficient layered approach to simplify porting MPI to new systems. One interface that can be used is the {\em channel} interface; this interface defines a collection of simple data-transfer operations. This interface can adapt to additional functionality, such as asynchronous or nonblocking transfers or remote memory operations. This paper describes this interface, some of the special issues, and gives instructions on creating a new MPICH implementation by implementing just a few routines.

David Levine, Michael Facello, Philip Hallstrom, Greg Reeder, Brian Walenz, and Fred Stevens, "STALK Users Guide," Technical Memorandum ANL/MCS-TM-214 (July 1996).	STALK is a system that models molecular docking between two proteins. A problem is posed as an optimization problem where the objective is to minimize the free energy of the molecular system by maximizing the intermolecular interaction energy between the two molecules. The possible number of conformations between the two molecules can be very large. A parallel genetic algorithm (GA) is used to explore the conformation space and identify the low-energy molecular configurations. The DAVE, a virtual reality environment, can be used to visualize and interact with the system while it is executing. STALK consists of two programs: stalk.ga the docking program that runs the GA, and stalk.cave the visualization program> The visualization component is optional.

David Levine, Michael Facello, Philip Hallstrom, Greg Reeder, Brian Walenz, and Fred Stevens, "STALK Programmers Guide," Technical Memorandum ANL/MCS-TM-215 (July 1996).	STALK is a system that models molecular docking between two proteins> A problem is posed as an optimization problem where the objective is to minimize the intermolecular interaction energy between the two molecules. The possible number of conformations between the two molecules can be very large. A parallel genetic algorithm (GA) is used to explore the conformation space and identify the low-energy molecular configurations. The CAVE, a virtual reality environment, can be used to visualize and interact with the system while it is executing. STALK consists of two programs: stalk.ga the docking program that runs the GA, and stalk.dave the visualization program. The visualization component is optional.

Brian P. Walenz, "MolView Users Guide," Technical Memorandum ANL/MCS-TM-216 (June 1996).	A system for viewing molecular data in a CAVE virtual reality environment is presented. The system, called MolView, consists of a front-end driver program that prepares the data and a backend CAVE program that displays the data. Both are written so that modifications and extensions are relatively easy to accomplish.

Preprints
I. Foster, ``Compositional Parallel Programming Languages,'' Preprint MCS-P354-0293 (Revised), Rev. 1, April 1996. To appear in ACM Trans. on Programming Languages and Systems, 1996.	In task-parallel programs, diverse activities can take place concurrently, and communication and synchronization patterns are complex and not easily predictable. Previous work has identified compositionality as an important design principle for task-parallel programs. In this paper, we discuss alternative approaches to the realization of this principle, which holds that properties of program components should be preserved when those components are composed in parallel with other program components. We review two programming languages, Strand and Program Composition Notation, that support compositionality via a small number of simple concepts, namely monotone operations on shared objects, a uniform addressing mechanism, and parallel composition. Both languages have been used extensively for large-scale application development, allowing us to provide an informed assessment of their strengths and weaknesses. We observe that while compositionality simplifies development of complex applications, the use of specialized languages hinders reuse of existing code and tools, and the specification of domain decomposition strategies. This suggests an alternative approach based on small extensions to existing sequential languages. We conclude the paper with a discussion of two languages that realize this strategy.

Daniel Ralph and Stephen Wright, "Superlinear Convergence of an Interior-Point Method for Monotone Variational Inequalities," Preprint MCS-P556-0196, January 1996.	We describe an infeasible-interior-point algorithm for monotone variational inequality problems and prove that it converges globally and superlinearly under standard conditions plus a constant rank constraint qualification. The latter condition represents a generalization of the two types of assumptions made in existing superlinear analyses; namely, linearity of the constraints and linear independence of the active constraint gradients.

David Levine, "Application of a Hybrid Genetic Algorithm to Airline Crew Scheduling," Computer and Operations Research (to appear). Also preprint MCS-P557-0196.	This paper discusses the development and application of a hybrid genetic algorithm to airline crew scheduling problems. The hybrid algorithm consists of a steady-state genetic algorithm and a local search heuristic. The hybrid algorithm was tested on a set of forty real-world problems. It found the optimal solution for half the problems, and good solutions for nine others. The results were compared to those obtained with branch-and-cut and branch-and-bound algorithms. The branch-and-cut algorithm was significantly more successful than the hybrid algorithm, and the branch-and-bound algorithm slightly better.

David Levine, William Gropp, Kimmo Forsman, and Lauri Kettunen, "Parallel Computation of Three-Dimensional Nonlinear Magnetostatic Problems," Preprint MCS-P558-0196, February 1996.	We describe a general-purpose parallel electromagnetic code for computing accurate solutions to large computationally demanding, 3D, nonlinear magnetostatic problems. The code, CORAL, is based on a volume integral equation formulation. Using an IBM SP parallel computer and iterative solution methods, we successfully solved the dense linear systems inherent in such formulations. A key component of our work was the use of the PETSc library, which provides parallel portability and access to the latest linear algebra solution technology.

Christian H. Bischof and Gregorio Quintana-Orti', "Computing Rank-Revealing QR Factorizations of Dense Matrices," ACM Trans. on Mathematical Software 24, no. 2 (June 1998), pp. 226-253. Also Preprint MCS-P559-0196, March 1996.	We develop algorithms and implementations for computing rank-revealing QR (RRQR) factorizations of dense matrices. First, we develop an efficient block algorithm for approximating an RRQR factorization, employing a windowed version of the commonly used Golub pivoting strategy, aided by incremental condition estimation. Second, we develop efficiently implementable variants of guaranteed reliable RRQR algorithms for triangular matrices originally suggested by Chandrasekaran and Ipsen and by Pan and Tang. We suggest algorithmic improvements with respect to condition estimation, termination criteria, and Givens updating. By combining the block algorithm with one of the triangular postprocessing steps, we arrive at an efficient and reliable algorithm for computing an RRQR factorization of a dense matrix. Experimental results on IBM RS/6000 and SGI R8000 platforms show that this approach performs up to three times faster than the less reliable QR factorization with column pivoting as it is currently implemented in LAPACK, and comes within 15% of the performance of the LAPACK block algorithm for computing a QR factorization without any column exchanges. Thus, we expect this routine to be useful in many circumstances where numerical rank deficiency cannot be ruled out but currently has been ignored because of the computational cost of dealing with it.

C. H. Bischof and G. Quintana-Orti', "Codes for Rank-Revealing QR Factorizations of Dense Matrices," ACM Trans. on Mathematical Software 24, no. 2 (June 1998), pp. 254-257. Also Preprint MCS-P560-0196, March 1996.	This article describes a suite of codes as well as associated testing and timing drivers for computing rank-revealing QR (RRQR) factorizations of dense matrices. The main contribution is an efficient block algorithm for approximating an RRQR factorization, employing a windowed version of the commonly used Golub pivoting strategy and improved versions of the RRQR algorithms for triangular matrices originally suggested by Chandrasekaran and Ipsen and by Pan and Tang, respectively. We highlight usage and features of these codes.

Stephen J. Wright, "Applying New Optimization Algorithms to Model Predictive Control," Chemical Process Control-V (to appear). Also Preprint MCS-P561-0196, January 1996.	The connections between optimization and control theory have been explored by many researchers, and optimization algorithms have been applied with success to optimal control. The rapid pace of developments in model predictive control has given rise to a host of new problems to which optimization has yet to be applied. Concurrently, developments in optimization, and especially in interior-point methods, have produced a new set of algorithms that may be especially helpful in this context. In this paper, we reexamine the relatively simple problem of control of linear processes subject to quadratic objectives and general linear constraints. We show how new algorithms for quadratic programming can be applied efficiently to this problem. The approach extends to several more general problems in straightforward ways.

Mark T. Jones and Paul E. Plassmann, "Adaptive Refinement of Unstructured Finite-Element Meshes," Finite Elements in Analysis and Design 25 (1997), 41-60. Also Preprint MCS-P562-0296, February 1996.	The finite element method used in conjunction with adaptive mesh refinement algorithms can be an efficient tool in many scientific and engineering applications. In this paper we review algorithms for the adaptive refinement of unstructured simplicial meshes (triangulations and tetrahedralizations). We discuss bounds on the quality of the meshes resulting from these refinement algorithms. Unrefinement and refinement along curved surfaces are also discussed. Finally, we give an overview of recent developments in parallel refinement algorithms.

Gregorio Quintana-Orti and Enrique S. Quintana-Orti, "Guaranteeing Termination of Chandrasekaran and Ipsen's Algorithm for Computing Rank-Revealing QR Factorizations," Preprint MCS-P564-0196, May 1996.	Many problems from science and engineering require the computation of a rank-revealing QR factorization of a matrix. We have developed a new algorithm based on Chandrasekaran and Ipsen's algorithm, with two advantages over it: First, our algorithm is much faster since we modify the main loop to accelerate its convergence, avoid the useless steps, and use faster subalgorithms. Second, we apply a technique, suggested by Pan and Tang, that ensures termination, achieves the desired bounds, and fits into our theoretical studies.

J. Fleckinger-Pelle', H. G. Kaper, and P. Takac, "Dynamics of the Ginzburg-Landau Equations of Superconductivity," Nonlinear Analysis, Theory, Methods & Applications 32, no. 5 (1998), pp. 647-665. Also Preprint MCS-P565-0296.	This article is concerned with the dynamical properties of solutions of the time-dependent Ginzburg-Landau (TDGL) equations of superconductivity. It is shown that the TDGL equations define a dynamical process when the applied magnetic field varies with time, and a dynamical system when the applied magnetic field is stationary. The dynamical system describes the large-time asymptotic behavior: Every solution of the TDGL equations is attracted to a set of stationary solutions, which are divergence free. These results are obtained in the "\phi = - \omega (\nabla \cdot A)" gauge, which reduces to the standard "\phi = -\nabla \cdot A" gauge if \omega = 1 and to the zero-electric potential gauge if \omega = 0; the treatment captures both in a unified framework. This gauge forces the London gauge, \nabla \cdot A = 0, for any stationary solution of the TDGL equations.

D. Diachin, L. Freitag, D. Heath, J. Herzog, W. Michels, and P. Plassmann, "Remote Engineering Tools for the Design of Pollution Control Systems for Commercial Boilers," Preprint MCS-P566-0296, March 1996.	We discuss a pilot project involving a collaboration between Nalco Fuel Tech (NFT), a small company that has developed state-of-the-art emission reduction systems for commercial boilers, and the computational science group at Argonne National Laboratory (ANL). The key objective of this project is the development of a real-time, interactive capability that allows the user to drive the computational model from within the virtual environment. In this case, the required interaction involves the placement of chemical injection systems in the boiler and a quick evaluation of their effectiveness in reducing undesirable emissions from the combustion process.

W. Gropp, E. Lusk, N. Doss, and A. Skjellum, ``A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard,'' Preprint MCS-P567-0296, July 1996.	MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum.

A. Geist, W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, W. Saphir, T. Skjellum, and M. Snir, ``MPI-2: Extending the Message-Passing Interface,'' Preprint MCS-P568-0296, July 1996.	This paper describes current activities of the MPI-2 Forum. The MPI-2 Forum is a group of parallel computer vendors, library writers, and application specialists working together to define a set of extensions to MPI (Message Passing Interface). MPI was defined by the same process and now has many implementations, both vendor-proprietary and publicly available, for a wide variety of parallel computing environments. In this paper we present the salient aspects of the evolving MPI-2 document as it now stands. We discuss proposed extensions and enhancements to MPI in the areas of dynamic process management, one-sided operations, collective operations, new language binding, real-time computing, external interfaces, and miscellaneous topics.

R. Thakur, W. Gropp, and E. Lusk, "An Experimental Evaluation of the Parallel I/O Systems of the IBM SP and Intel Paragon Using a Production Application," to appear in Proc. 3rd Int'l Conf. of the Austrian Center for Parallel Computation with special emphasis on Parallel Databases and Parallel I/O, Sept. 1996. Also Preprint ANL/MCS-P569-0296, May 1996.	We present the results of an experimental evaluation of the parallel I/O systems of the IBM SP and Intel Paragon using a real three-dimensional parallel application code. This application, developed by scientists at the University of Chicago, simulates the gravitational collapse of self-gravitating gaseous clouds. It performs parallel I/O by using library routines that we developed and optimized separately for the SP and Paragon. The I/O routines perform two-phase I/O and use the parallel file systems PIOFS on the SP and PFS on the Paragon. We studied the I/O performance for two different sizes of the application. In the small case, we found that I/O was much faster on the SP. In the large case, open, close, and read operations were only slightly faster, and seeks were significantly faster, on the the SP; whereas, writes were slightly faster on the Paragon. The communication required within our I/O routines was faster on the Paragon in both cases. The highest read bandwidth obtained was 48 Mbytes/sec., and the highest write bandwidth obtained was 31.6 Mbytes/sec., both on the SP.

C. H. Bischof and M. R. Haghighat, "Hierarchical Approaches to Automatic Differentiation," Preprint ANL/MCS-P571-0396, June 1996.	A mathematical function, specified by a computer program, can be differentiated efficiently through the exploitation of its program structure. The important properties of a program for an efficient derivative code are the asymmetries between the number of inputs and outputs of program components at various levels of abstraction and the mathematical complexity of the involved operators. Automatic generation of efficient derivative codes thus requires analysis of programs for detection of such properties and systematic methods for their exploitation in composing the derivative codes. We suggest a hierarchical approach based on a partitioning of the computational or program graph as a means to deduce workable solutions to this hard problem. Each partition corresponds to a localized scope for derivative computation, and hierarchical partitions provide a mechanism for exploiting program structure at various levels. As a particular example, we discuss dynamic programming approaches for finding good one-dimensional partitions and generalizations to arbitrary directed acyclic graphs that, by recycling substructure information, allow one to determine the optimal elimination ordering for a graph with n nodes with complexity O(2n), as compared with the O(n!) complexity of a naive search. Lastly, we give a concrete example illustrating the hierarchical approach on the driven cavity problem from the MINPACK-2 optimization test set collection.
C. H. Bischof and P.-T. Wu, ``Exploiting Intermediate Sparsity in Computing Derivatives for a Leapfrog Scheme,'' Preprint ANL/MCS-P572-0396, February 1997.	The leapfrog scheme is a commonly used second-order method for solving differential equations. Letting Z denote the state of the system, we compute the state at the next time step as Z(t+1) = H(Z(t),Z(t-1),W), where t denotes a particular time step, H is the nonlinear timestepping operator, and W are parameters that are not time dependent. In this article, we show how the associativity of the chain rule of differential calculus can be used to expose and exploit intermediate derivative sparsity arising from the typical localized nature of the operator H. We construct a computational harness that capitalizes on this structure while employing automatic differentiation tools to automatically generate the derivative code corresponding to the evaluation of one time step. Experimental results with a 2-D shallow water equation model on IBM RS/6000 and Sun SPARCstations illustrate these issues.

D. Diachin, L. Freitag, D. Heath, J. Herzog, P. Plassmann, and W. Michels, ``Interactive Simulation and Analysis of Emission Reduction Systems in Commercial Boilers,'' Preprint ANL/MCS-P573-0396.	In this paper we describe an interactive virtual environment developed to model an emission reduction system for commercial boilers. The interactive environment is used to optimize the performance of the reduction system through the spatial adjustment and spray reconfiguration of reagent injectors. We describe the three principal components of the system: a computational model for the particle dynamics, a three-dimensional display device and graphics environment, and the communication layer that allows the interaction of the user in the visualization environment with the computational model. Timing results for each component are given for three hardware configurations that demonstrate the real-time performance of this tool.

T. Gaasterland and B. Dorr, "Selecting Tense, Aspect, and Connecting Words in Language Generation," Proc. International Joint Conf. on Artificial Intelligence, Aug. 29-31, 1995, Montreal. Also Preprint ANL/MCS-P574-0296, April 1996.	Generating language that reflects the temporal organization of represented knowledge requires a language generation model that integrates contemporary theories of tense and aspect, temporal representations, and methods to plan text. This paper presents a model that produces complex sentences that reflect temporal relations present in underlying temporal concepts. The main result of this work is the successful application of constrained linguistic theories of tense and aspect to a generator which produces meaningful event combinations and selects appropriate connecting words that relate them.

T. Gaasterland and E. Selkov, "Reconstruction of Metabolic Networks Using Incomplete Information," 3rd Int'l Conf. on Intelligent Systems for Molecular Biology, Cambridge, England, 7/17-19/95. Also Preprint ANL/MCS-P575-0296.	This paper describes an approach that uses methods for automated sequence analysis and multiple databases accessed through an object+attribute view of the data, together with metabolic pathways, reaction equations, and compounds parsed into a logical representation from the Enzyme and Metabolic Pathway Database, as the sources of data for automatically reconstructing a weighted partial metabolic network for a prokaryotic organism. Additional information can be provided interactively by the expert user to guide reconstruction.

T. Gaasterland and C. Sensen, "Using Multiple Tools for Automated Genome Interpretation in an Integrated Environment," Preprint ANL/MCS-P576-0296.	The year 1995 heralded the sequencing of the first whole microbial genomes. The end of 1996 will see at least four more. The entire yeast genome will also be submitted to the public databases. At this rate, we can expect to have access to the complete genomes of at least 25-30 organisms within five years. While most problems of automated sequence generation and assembly have been solved, the automated analysis of genomic sequences has scarcely been addressed. Consequently, most genome sequences submitted to the publicly accessible databases are either poorly or not at all annotated. Sequence interpretation "by hand" presents an overwhelming challenge to researchers. To address this situation we are developing a prototype for a UNIX-based integrated environment for automated genome sequence interpretation.

T. A. DeFanti, I. Foster, M. E. Papka, R. Stevens, T. Kuhfuss, "Overview of the I-WAY: Wide Area Visual Supercomputing," International Journal of Supercomputing Applications 10/2-3 (1996), pp. 123-131. Also Preprint ANL/MCS-P579-0396.	This paper discusses the I-WAY project and provides an overview of the papers in this issue of IJSA. The I-WAY is an experimental environment for building distributed virtual reality applications and for exploring issues of distributed wide area resource management and scheduling. The goal of the I-WAY project is to enable researchers use of multiple inter-networked supercomputers and advanced visualization systems to conduct very large-scale computations. By connecting a dozen ATM testbeds, seventeen supercomputer centers, five virtual reality research sites, and over sixty applications groups, the I-WAY project has created an extremely diverse wide area environment for exploring advanced applications. This environment has provided a glimpse of the future for advanced scientific and engineering computing.

I. Foster, C. Kesselman, and S. Tuecke, "The Nexus Task-parallel Runtime System," Proc. 1st Intl. Workshop on Parallel Processing, Tata McGraw Hill, 1994. Also Preprint ANL/MCS-P581-0396.	A runtime system provides a parallel language compiler with an interface to the low-level facilities required to support interaction between concurrently executing program components. Nexus is a portable runtime system for task-parallel programming languages. Distinguishing features of Nexus include its support for multiple threads of control, dynamic processor acquisition, dynamic address space creation, a global memory model via interprocessor references, and asynchronous events. In addition, it supports heterogeneity at multiple levels, allowing a single computation to utilize different programming languages, executables, processors, and network protocols. Nexus is currently being used as a compiler target for two task-parallel languages: Fortran M and Compositional C++. In this paper, we present the Nexus design, outline techniques used to implement Nexus on parallel computers, show how it is used in compilers, and compare its performance with that of another runtime system.

S. K. Park, K. K. Droegemeier, and C. H. Bischof, "Automatic Differentiation as a Tool for Sensitivity Analysis of a Convective Storm in a 3-D Cloud Model," Proc. 2nd International Symposium on Computational Differentiation, Feb. 12-15, 1996, Santa Fe. Also Preprint ANL/MCS-P582-0496.	The ADIFOR automatic differentiation tool is applied to a 3-D storm-scale meteorological model to generate a sensitivity-enhanced code capable of providing derivatives of all model output variables and related diagnostic (derived) parameters as a function of specified control parameters. The tangent linear approximation, applied to a deep convective storm by the first of its kind using a full-physics compressible model, is valid up to 50 min for a 1% water vapor perturbations. The result is very encouraging considering the highly nonlinear and discontinuous properties of solutions. The ADIFOR-generated code has provided valuable sensitivity information on storm dynamics. Especially, it is very efficient and useful for investigating how a perturbation inserted at earlier time propagates through the model variables at later times. However, it is computationally very expensive to be applied to the variational data assimilation, especially for 3-D meteorological models, which have a large number of input variables.

I. Foster, J. Geisler, B. Nickless, W. Smith, S. Tuecke, "Software Infrastructure for the I-WAY High-Performance Distributed Computing Experiment," Proc. 5th IEEE Symp. on High Performance Distributed Computing, pp. 562-571, IEEE Computer Society Press, 1996. Also Preprint ANL/MCS-P583-0396.	High-speed wide area networks are expected to enable innovative applications that integrate geographically distributed, high-performance computing, database, graphics, and networking resources. However, there is as yet little understanding of the higher-level services required to support these applications, or of the techniques required to implement these services in a scalable, secure manner. We report on a large-scale prototyping effort that has yielded some insights into these issues. Building on the hardware base provided by the I-WAY, a national-scale Asynchronous Transfer Mode (ATM) network, we developed an integrated management and application programming system, called I-Soft. This system was deployed at most of the 17 I-WAY sites and used by many of the 60 applications demonstrated on the I-WAY network. In this article, we describe the I-Soft design and report on lessons learned from application experiments.

I. Foster, C. Kesselman, and S. Tuecke, "The Nexus Approach to Integrating Multithreading and Communication," J. of Parallel and Distributed Computing 37 (1996), pp. 70-82. Also Preprint ANL/MCS-P584-0396.	Lightweight threads have an important role to play in parallel systems: they can be used to exploit shared-memory parallelism, to mask communication and I/O latencies, to implement remote memory access, and to support task-parallel and irregular applications. In this paper, we address the question of how to integrate threads and communication in high-performance distributed-memory systems. We propose an approach based on global pointer and remote service request mechanisms, and explain how these mechanisms support dynamic communication structures, asynchronous messaging, dynamic thread creation and destruction, and a global memory model via interprocessor references. We also explain how these mechanisms can be implemented in various environments. Our global pointer and remote service request mechanisms have been incorporated in a runtime system called Nexus that is used as a compiler target for parallel languages and as a substrate for higher-level communication libraries. We report the results of performance studies conducted using a Nexus implementation; these results indicate that Nexus mechanisms can be implemented efficiently on commodity hardware and software systems.

A. K. Sinha, C. H. Bischof, and D. Shiriaev, "Application of Automatic Differentiation to Reservoir Design Models," Journal of Water Resources Planning and Management, ASCE, 124(3) (1998), pp. 162-167. Also Preprint ANL/MCS-P585-0496, August 1996, revised August 1997.	Automatic differentiation is a technique for computing derivatives accurately and efficiently with minimal human effort. Calculation of derivatives of numerical models is necessary for the optimization of reservoir systems to determine optimal sizes for reservoirs. For the computational experiments reported in the present investigation, the ADIFOR (Automatic Differentiation in FORtran) precompiler is employed for computing derivatives. Given the Fortran 77 source code for the function, ADIFOR generates code that computes derivatives of the model output with respect to the input parameters. Strategies to postoptimize the ADIFOR-generated derivative code and their impact on the computational requirements for evaluating derivatives and model solution are also discussed. We report on the use of automatic differentiation and divided difference approaches for computing derivatives for a single- and multiple-reservoir yield model. The results show that, for both the single- or multiple-reservoir model, automatic differentiation computes derivatives exactly and more efficiently than the divided difference implementation. Postoptimization of the generated derivative code results in computation of derivatives in orders of magnitude less CPU time than with the ADIFOR-generated derivative code. It is also observed that the availability of exact derivatives significantly benefits the convergence of optimization algorithm. Thus, the solution of the multireservoir problem, which took 10.5 hours with divided difference derivatives, is decreased to less than two hours with ``ADIFOR out of the box'' derivatives, and to less than an hour using the postoptimized ADIFOR derivative code.

C. H. Bischof, B. Lang, and X. Sun, "A Framework for Symmetric Band Reduction," ACM Trans. on Mathematical Software 26, No. 4 (Dec. 2000), pp. 581-601. Also Preprint ANL/MCS-P586-0496, October 1996.	We develop an algorithmic framework for reducing the bandwidth of symmetric matrices. This framework includes the reduction of full matrices to banded or tridiagonal form and the reduction of banded matrices to narrower banded or tridiagonal form, possibly in multiple steps. Our framework leads to algorithms that require fewer floating point operations than do standard algorithms. In addition, it allows for space-time tradeoffs and enables or increases the use of blocked transformations.

C. H. Bischof, B. Lang, and X. Sun, "The SBR Toolbox -- Software for Successive Band Reduction," ACM TOMS 26(4) (2000), 602-616. Also Preprint ANL/MCS-P587-0496, October 1996.	We present a software toolbox for symmetric band reduction, together with a set of testing and timing drivers. The toolbox contains routines for the reduction of full symmetric matrices to banded form and the reduction of banded matrices to narrower banded or tridiagonal form, with optional accumulation of the orthogonal transformations, as well as repacking routines for storage rearrangements. The functionality and the calling sequences of the routines are described, with a detailed discussion of the "control'' parameters that allow adaptation of the codes to particular machine and matrix characteristics. We also briefly describe the testing and timing drivers included in the toolbox.

H. G. Kaper and P. Takac, "An Equivalence Relation for the Ginzburg-Landau Equations of Superconductivity," Preprint ANL/MCS-P588-0496, June 1996; revised 10/96.	Gauge invariance is used to establish an equivalence relation between solutions of the stationary and time-dependent Ginzburg-Landau equations describing the same physical state of a superconductor. The equivalence relation shows how equilibrium configurations are obtained as large-time asymptotic limits of solutions of the time-dependent Ginzburg-Landau equations.

C. Bischof and A. Carle, "Users' Experience with ADIFOR 2.0," Preprint ANL/MCS-P589-0496, June 1996.	In July 1995, the ADIFOR 2.0 system for automatic differentiation of Fortran was made available to the academic and commercial communities via the World Wide Web. By January 1996, we had received and processed over one hundred requests for ADIFOR 2.0. In this paper, we describe some of the experiences of users of the system that should be interesting to developers of automatic differentiation tools.

G. W. Crabtree, G. K. Leaf, H. G. Kaper, D. W. Braun, V. M. Vinokur, and A. E. Koshelev, "Dynamic Vortex Phases in Superconductors with Correlated Disorder," Preprint ANL/MCS-P590-0496, April 1996.	The nature of driven motion of a vortex solid in the presence of a planar pinning defect is investigated by large-scale simulations based on the time-dependent Ginzburg-Landau equations. Three dynamic phases are identified and characterized by their relative positional and velocity correlations.

R. B. Lehoucq, "Restarting an Arnoldi Reduction," Preprint ANL/MCS-P591-0496, May 1996.	The Arnoldi reduction is an efficient procedure for approximating a subset of the eigensystem of a large sparse matrix A of order n. At each step, a partial orthogonal reduction of A into an upper Hessenberg matrix is produced. The eigenvalues of this Hessenberg matrix are used to approximate a subset of the eigenvalues of the large matrix A. The approximation to the eigenvalues of A generally improves as the order of the Hessenberg matrix increases. Unfortunately, so do the cost and storage of the reduction. A popular alternative is to define an iteration by restarting the reduction with information in a length m Arnoldi reduction. The hope is that this restarted reduction has improved estimates to the eigenvalues of A. This paper considers the various approaches used to restart a reduction. Analysis and numerical examples are presented that explain and exhibit the generally superior properties of Sorensen's implicitly restarted Arnoldi iteration. The analysis exploits the fact that an IRA iteration is mathematically equivalent to a curtailed QR iteration.

Rajeev Thakur, William Gropp, and Ewing Lusk, ``An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces,'' Proc. 6th Symp. on the Frontiers of Massively Parallel Computation, October 1996, pp. 180-187, IEEE. Also Preprint ANL/MCS-P592-0596, August 1996.	In this paper, we propose a strategy for implementing parallel-I/O interfaces portably and efficiently. We have defined an abstract-device interface for parallel I/O, called ADIO. Any parallel-I/O API can be implemented on multiple file systems by implementing the API portably on top of ADIO, and implementing only ADIO on different file systems. This approach simplifies the task of implementing an API and yet exploits the specific high-performance features of individual file systems. We have used ADIO to implement the Intel PFS interface and subsets of MPI-IO and IBM PIOFS interfaces on PFS, PIOFS, Unix, and NFS file systems. Our performance studies indicate that the overhead of using ADIO as an implementation strategy is very low.

I. Foster, D. R. Kohr, Jr., R. Krishnaiyer, and A. Choudhary, "Double Standards: Bringing Task Parallelism to HPF via the Message Passing Interface," Preprint ANL/MCS-P593-0596, October 1996.	High Performance Fortran (HPF) does not allow efficient expression of mixed task/data-parallel computations or the coupling of separately compiled data-parallel modules. In this paper, we show how a coordination library implementing the Message Passing Interface (MPI) can be used to represent these common parallel program structures. This library allows data-parallel tasks to exchange distributed data structures using calls to simple communication functions. We present microbenchmark results that characterize the performance of this library and that quantify the impact of optimizations that allow reuse of communication schedules in common situations. In addition, results from two-dimensional FFT, convolution, and multiblock programs demonstrate that the HPF/MPI library can provide performance superior to that of pure HPF. We conclude that this synergistic combination of two parallel programming standards represents a useful approach to task parallelism in a data-parallel framework, increasing the range of problems addressable in HPF without requiring complex compiler technology.

D. Abramson, I. Foster, J. Giddy, A. Lewis, R. Sosic, R. Sutherst, and N. White, "The Nimrod Computational Workbench: A Case Study in Desktop Metacomputing," Preprint ANL/MCS-P594-0596, October 1996.	The coordinated use of geographically distributed computers, or metacomputing, can in principle provide more accessible and cost-effective supercomputing than conventional high-performance systems. However, we lack evidence that metacomputing systems can be made easily usable, or that there exist large numbers of applications able to exploit metacomputing resources. In this paper, we present work that addresses both these concerns. The basis for this work is a system called Nimrod that provides a desktop problem-solving environment for parametric experiments. We describe how Nimrod has been extended to support the scheduling of computational resources located in a wide-area environment, and report on an experiment in which Nimrod was used to schedule a large parametric study across the Australian Internet. The experiment provided both new scientific results and insights into Nimrod capabilities. We relate the results of this experiment to lessons learned from the I-WAY distributed computing experiment, and draw conclusions as to how Nimrod and I-WAY--like computing environments should be developed to support desktop metacomputing.

I. Foster, J. Geisler, and S. Tuecke, "MPI on the I-WAY: A Wide-Area, Multimethod Implementation of the Message Passing Interface," Preprint ANL/MCS-P595-0596, October 1996.	High-speed wide-area networks enable innovative applications that integrate geographically distributed computing, database, graphics, and networking resources. The Message Passing Interface (MPI) can be used as a portable, high-performance programming model for such systems. However, the wide-area environment introduces challenging problems for the MPI implementor, because of the heterogeneity of both the underlying physical infrastructure and the authentication and software environment at different sites. In this article, we describe an MPI implementation that incorporates solutions to these problems. This implementation, which was developed for the I-WAY distributed-computing experiment, was constructed by layering MPICH on the Nexus multithreaded runtime system. Nexus provides automatic configuration mechanisms that can be used to select and configure authentication, process creation, and communication mechanisms in heterogeneous systems.

I. Foster, C. Kesselman, and M. Snir, "Generalized Communicators in the Message Passing Interface," Preprint ANL/MCS-P596-0596, October 1996.	We propose extensions to the Message Passing Interface (MPI) that generalize the MPI communicator concept to allow multiple communication endpoints per process, dynamic creation of endpoints, and the transfer of endpoints between processes. The generalized communicator construct can be used to express a wide range of interesting communication structures, including collective communication operations involving multiple threads per process, communications between dynamically created threads, and object-oriented applications in which communications are directed to specific objects. Furthermore, this enriched functionality can be provided in a manner that preserves backward compatibility with MPI. We describe the proposed extensions, illustrate their use with examples, and discuss implementation issues.

I. Foster, D. Kohr, R. Krishnaiyer, and A. Choudhary, "MPI as a Coordination Layer for Communicating HPF Tasks," Preprint ANL/MCS-P597-0596, October 1996.	Data-parallel languages such as High Performance Fortran (HPF) represent a simple execution model is which a single thread of control performs high-level operations on distributed arrays. These languages can greatly ease the development of parallel programs. Yet there are large classes of applications for which a mixture of task and data parallelism is most appropriate. Such applications can be structured as collections of data-parallel tasks that communicate by using explicit message passing. Because he Message Passing Interface (MPI) defines standardized, familiar mechanisms for this communication model, we propose that HPF tasks communicate by making calls to a coordination library that provides an HPF binding for MPI. The semantics of a communication interface for sequential languages can be ambiguous when the interface is invoked from a parallel language; we show how these ambiguities can be resolved by describing one possible HPF binding for MPI. We then present the design of a library that implements this binding, discuss issues that influenced our design decisions, and evaluate the performance of a prototype HPF/MPI library using a communications microbenchmark and application kernel. Finally, we discuss how MPI features might be incorporated into our design framework.

W. D. Reynolds, Jr. and R. V. Kenyon, "The Wavelet Transform and the Suppression Theory of Binocular Vision for Stereo Image Compression," Preprint ANL/MCS-P598-0596, July 1996.	In this paper we present a method for compression of stereo images. The proposed scheme is a frequency domain approach based on the suppression theory of binocular vision. By using the information in the frequency domain, complex disparity estimation techniques can be avoided. The wavelet transform is used to obtain a multiresolution analysis of the stereo pair by which the subbands convey the necessary frequency domain information.

M. Huang, D. Levine, L. Turner, L. Kettunen, and M. Papka, "Virtual Reality Visualization of 3-D Electromagnetic Fields," Preprint ANL/MCS-P599-0596, August 1996.	One of the major problems in three-dimensional (3-D) electromagnetic field computation is visualizing the calculated field. Virtual reality techniques can be used as an aid to this process by providing multiple viewpoints, allowing immersion within the field, and taking advantage of the human ability to process 3-D spatial information. In this paper we present an example of 3-D electromagnetic field visualization in the CAVE virtual-reality environment.

S. Wright, "Modified Cholesky Factorizations in Interior-Point Algorithms for Linear Programming," Preprint ANL/MCS-P600-0596, August. Look here for the dvi version.	We investigate a modified Cholesky algorithm similar to those used in current interior-point codes for linear programming. Cholesky-based interior-point codes are popular for three reasons: their implementation requires only minimal changes to standard sparse Cholesky codes (allowing us to take full advantage of software written by specialists in that area); they tend to be more efficient than competing approaches that use different factorizations; and they perform robustly on most practical problems, yielding good interior-point steps even when the coefficient matrix is ill conditioned. We explain the surprisingly good performance of the Cholesky-based approach by using analytical tools from matrix perturbation theory and error analysis, illustrating our results with computational experiments. Finally, we point out the limitations of this approach.

I. Foster, J. Geisler, C. Kesselman, and S. Tuecke, "Multimethod Communications for High-Performance Metacomputing Applications," Preprint ANL/MCS-P601-0596, October 1996.	Metacomputing systems use high-speed networks to connect supercomputers, mass storage systems, scientific instruments, and display devices with the objective of enabling parallel applications to access geographically distributed computing resources. However, experience shows that high performance often can be achieved only if applications can integrate diverse communication substrates, transport mechanisms, and protocols, chosen according to where communication is directed, what is communicated, or when communication is performed. In this article, we describe a software architecture that addresses this requirement. This architecture allows multiple communication methods to be supported transparently in a single application, with either automatic or user-specified selection criteria guiding the methods used for each communication. We describe an implementation of this architecture, based on the Nexus communication library, and use this implementation to evaluate performance issues. The implementation supported a wide variety of applications in the I-WAY metacomputing experiment at Supercomputing 95; we use one of these applications to provide a quantitative demonstration of the advantages of multimethod communication in a heterogeneous networked environment.

J. Czyzyk and T. J. Wisniewski, "The Diet Problem: A WWW-based Interactive Case Study in Linear Programming," Preprint ANL/MCS-P602-0796, July 1996.	The Diet Problem is an interactive WWW-based case study that introduces users (particularly students and practitioners) to the mathematics of optimization. Users select foods for their menus, edit a set of nutritional constraints, and solve the linear program simply by clicking some buttons and making simple entries. A detailed analysis of the diet, complete with graphs and tables, is returned to the user. The Diet Problem utilizes a simple, yet interesting, linear program to introduce the concepts of model planning, data gathering, optimal solutions, and dual variables. The power of the World Wide Web makes the case study accessible to a wide variety of users in many different countries.

D. Levine, M. Facello, P. Hallstrom, G. Reeder, B. Walenz, and F. Stevens, "STALK: An Interactive Virtual Molecular Docking System," Preprint ANL/MCS-P603-0796, August 1996.	Several recent technologies--genetic algorithms, parallel and distributed computing, virtual reality, and high-speed networking--provide the foundation for a new approach to the computational study of molecular interactions. Parallel genetic algorithms are an efficient and effective means to explore the large search spaces typical of these problems, while virtual reality provides an immersive environment for visualizing the interactions. In this paper we discuss the STALK system, which uses high-speed networking to couple a parallel genetic algorithm to a virtual reality environment. This combination allows a local or remote user to interact in real-time with the simulation through the virtual reality environment. Molecular docking experiments using an IBM SP parallel computer and a CAVE virtual reality environment are discussed.

N. T. Karonis, "An Alternative Method to Remove Duplicate Tuples Resulting from Operations in the Relational Algebra in a Cube-Connected Multicomputer System," Preprint ANL/MCS-P604-0796, August 1996.	The problem of performing database operations on parallel architectures has received much attention, both as applied and theoretical areas of research. Much of the attention has been focused on performing these operations on distributed-memory architectures, for example, a hypercube. Algorithms that perform, in particular, relational database operations on a hypercube typically exploit the hypercube's unique interconnectivity to not only process the relational operators efficiently but also perform dynamic load balancing. Certain relational operators (e.g., projection and union) can produce interim relations that contain duplicate tuples. As a result, an algorithm for a relational database system must address the issue of removing duplicate tuples from these interim relations. The algorithms accomplish this by compacting the relation into hypercubes of smaller and smaller dimensions. We present an alternative method for removing duplicate tuples from a relation that is distributed over a hypercube by using the embedded ring found in every hypercube. Through theoretical analysis of the algorithm and empirical observation, we demonstrate that using the ring to remove the duplicate tuples is significantly more efficient than using the hypercube.

R. B. Lehoucq, "The Computation of Elementary Unitary Matrices," Preprint ANL/MCS-P605-0796, August 1996.	The construction of elementary unitary matrices that transform a complex vector to a multiple of e_1, the first column of the identity matrix, are studied. We present four variants and their software implementation, including a discussion on the LAPACK subroutine CLARFG. Comparisons are also given.

C. H. Bischof, "Automatic Differentiation and Numerical Software Design," Preprint ANL/MCS-P606-0896, January 1997.	Automatic differentiation (AD) tools can generate accurate and efficient derivative code for computer programs of arbitrary length. In some cases, however, the developer of the code to be differentiated may be required to provide additional information to an AD tool to ensure the desired solution. We illustrate these issues with nondifferentiable language intrinsics such as max() in the context of computing the Euclidean norm and numerical integrators. In both cases, very little additional information is required to ensure that AD computes the ``do-what-I-mean'' derivatives. In addition, the provision of such information makes it easy to derive ``derivative-enhanced'' versions of these codes.

L. A. Freitag and C. Ollivier-Gooch, "A Comparison of Tetrahedral Mesh Improvement Techniques," Preprint ANL/MCS-P607-0896, October 1996.	Automatic meshes generation and adaptive refinement methods for complex three-dimensional domains have proven to be very successful tools for the efficient solution of complex applications problems. These methods can, however, produce poorly shaped elements that cause the numerical solution to be less accurate and more difficult to compute. Fortunately, the shape of the elements can be improved through several mechanisms, including face-swapping techniques that change local connectivity and optimization-based mesh smoothing methods that adjust grid point location. We consider several criteria for each of these two methods and compare the quality of several meshes obtained by using different combinations of swapping and smoothing. Computational experiments show that swapping is critical to the improvement of general mesh quality and that optimization-based smoothing is highly effective in eliminating very small and very large angles. The highest quality meshes are obtained by using a combination of swapping and smoothing techniques.

C. Bischof and A. Griewank, "Tools for the Automatic Differentiation of Computer Programs," Preprint ANL/MCS-P608-0896, January 1997.	Automatic differentiation (AD) is a methodology for developing sensitivity-enhanced versions of arbitrary computer programs. In this paper, we provide some background information on AD and basic implementation issues for the design of general purpose tools that can deal with codes from the Fortran and C family, address some frequently asked questions, and provide pointers for further study.

T. Canfield, D. Diachin, L. Freitag, D. Heath, J. Herzog, W. Michels, "Interactive Computational Models of Particle Dynamics Using Virtual Reality," Symp. on Virtual Reality in Manufacturing Research and Education, Chicago, 1996. Also Preprint ANL/MCS-P609-0996.	We discuss an interactive environment for the visualization, analysis, and modification of computational models used in industrial settings. In particular, we focus on interactively placing massless, massed, and evaporating particulate matter in computational fluid dynamics applications. We discuss the numerical model used to compute the particle pathlines in the fluid flow for display and analysis. We briefly describe the toolkits developed for vector and scalar field visualization, interactive particulate source placement, and a three-dimensional GUI interface. This system is currently used in two industrial applications, and we present our tools in the context of these applications. We summarize the current state of the project and offer directions for future research.

S.-J. Wu, Y.-J. J. Wu, and J. J. Shaw, "DNA Solutions to Searching for the Hamiltonian Cycle and the Traveling-Salesman Cycle," Preprint ANL/MCS-P610-0996, October 1996.	Computational problems for which an algorithm cannot be determined by polynomial time are classified as NP complete problems. Taking advantage of their great capacity to conduct parallel reactions, DNA molecules and their experimental protocols have been proposed to solve such problems which otherwise are time consuming for electronic computers. Based on a working archetype, models are presented here to compute two NP-complete problems: searching for the Hamiltonian cycle and the traveling-salesman cycle.

S. Carr and R. B. Lehoucq, "Compiler Blockability of Dense Matrix Factorizations," Preprint ANL/MCS-P611-0996, October 1996.	The goal of the LAPACK project is to provide efficient and portable software for dense numerical linear algebra computations. By recasting many of the fundamental dense matrix computations in terms of calls to an efficient implementation of the BLAS (Basic Linear Algebra Subprograms), the LAPACK project has, in large part, achieved its goal. Unfortunately, the efficient implementation of the BLAS often results in machine-specific code that is not portable across multiple architectures without a significant loss in performance or a significant effort to re-optimize them. This paper examines whether most of the hand optimizations performed on matrix factorization codes are unnecessary because they can (and should) be performed by the compiler. We believe that it is better for the programmer to express algorithms in a machine-independent form and allow the compiler to handle the machine-dependent details. This gives the algorithms portability across architectures and removes the error-prone, expensive, and tedious process of hand optimization. Although there currently exist no production compiler that can perform all the loop transformations discussed in this paper, a description of current research in compiler technology is provided that will prove beneficial to the numerical linear algebra community. We show that the Cholesky and LU factorizations may be optimized automatically by a compiler to be as efficient as the same hand-optimized version found in LAPACK. We also show that the QR factorization may be optimized by the compiler to perform comparably with the hand-optimized LAPACK version on modest matrix sizes. Our approach allows us to conclude that with the advent of the compiler optimizations discussed in this paper, matrix factorizations may be efficiently implemented in a machine-independent form.

R. B. Lehoucq and K. Meergengen, "The Inexact Rational Krylov Sequence Method," SIAM J. Matrix Anal. Appl. 20(1) (1998), pp. 131-148. Journal title: "Using Generalized Cayley Transformations within an Inexact Rational Krylov Sequence Method," Also Preprint ANL/MCS-P612-1096, October 1996.	The rational Krylov sequence (RKS) method is a generalization of Arnoldi's method. It constructs an orthogonal reduction of a matrix pencil into an upper Hessenberg pencil. The RKS method is useful when the matrix pencil may be efficiently factored. However, it requires the solution of a linear system at every step. This article considers solving the resulting linear systems in an inexact manner by using an iterative method. We show that a Cayley transformation used within the RKS method is more efficient and robust than the usual shift-and-invert transformation. A relationship with the recently introduced Jacobi-Davidson method of Sleijpen and van der Vorst is also established.

I. Foster, N. T. Karonis, C. Kesselman, G. Koenig, S. Tuecke, "A Secure Communications Infrastructure for High-Performance Distributed Computing," Proc. Sixth IEEE Int'l Symp. on High Performance Distributed Comtpuing, Aug. 5-8, 1997, Portland State Univ., Portland, Oregon. Also Preprint ANL/MCS-P613-0996.	Applications that use high-speed networks to connect geographically distributed supercomputers, databases, and scientific instruments may operate over open networks and access valuable resources. Hence, they can require mechanisms for ensuring integrity and confidentiality of communications and for authenticating both users and resources. Security solutions developed for traditional client-server applications do not provide direct support for the program structures, programming tools, and performance requirements encountered in these applications. We address these requirements via a security-enhanced version of the Nexus communications library, which we use to provide secure versions of parallel libraries and languages, including the Message Passing Interface. These tools permit a fine degree of control over that, where, and when security mechanisms are applied. In particular, a single application can mix secure and nonsecure communication, allowing the programmer to make fine-grained security/performance tradeoffs. We present performance results that quantify the performance of our infrastructure.

I. Foster and C. Kesselman, "Globus: A Metacomputing Infrastructure Toolkit," Int. J. Supercomput. Appl. 11(2) (1997), pp. 115-128. Also Preprint ANL/MCS-P614-0996.	Emerging high-performance applications require the ability to exploit diverse, geographically distributed resources. These applications use high-speed networks to integrate supercomputers, large databases, archival storage devices, advanced visualization devices, and/or scientific instruments to form networked virtual supercomputers or metacomputers. While the physical infrastructure to build such systems is becoming widespread, the heterogeneous and dynamic nature of the metacomputing environment poses new challenges for developers of system software, parallel tools, and applications. In this article, we introduce Globus, a system that we are developing to address these challenges. The Globus system is intended to achieve a vertically integrated treatment of application, middleware, and network. A low-level toolkit provides basic mechanisms such as communication, authentication, network information, and data access. These mechanisms are used to construct various higher-level metacomputing services, such as parallel programming tools and schedulers. Our long-term goal is to build an Adaptive Wide Area Resource Environment (AWARE), an integrated set of higher-level services that enable application to adapt to heterogeneous and dynamically changing metacomputing environments. Preliminary versions of Globus components were deployed successfully as part of the I-WAY networking experiment.

J. Czyzyk, M. P. Mesnier, and J. J. More', "The Network-Enabled Optimization System (NEOS) Server," Preprint ANL/MCS-P615-1096, March 1997.	The Network-Enabled Optimization System (NEOS) is an environment for solving optimization problems over the Internet. Users submit optimization problems to the NEOS Server via e-mail, the World Wide Web, or the NEOS Submission Tool. The NEOS Server locates the appropriate optimization solver, computes all additional information (for example, derivatives and sparsity patterns) required by the solver, links the optimization problem with the solver, and returns a solution. This article discusses the design and implementation of the NEOS Server.

W. McCune and A. D. Sands, "Computer and Human Reasoning: Single Implicative Axioms for Groups and for Abelian Groups," Preprint ANL/MCS-P617-1096, November 1996.	The search for single axioms for groups has long interested mathematicians. In 1938, Tarski presented the following single equational axiom (in terms of subtraction) for Abelian groups: x-(y-(z-(x-y))) = z. In 1952, Higman and Neumann presented the following single equational axiom (in terms of division) for ordinary groups: (x/((((x/x)/y)/z)/(((x/x)/x)/z))) = y.

W. McCune, "33 Basic Test Problems: A Practical Evaluation of Some Paramodulation Strategies," Preprint ANL/MCS-P618-1096, November 1996.	Many researchers who study the theoretical aspects of inference systems believe that if inference rule A is complete and more restrictive than inference rule B, then the use of A will lead more quickly to proofs than will the use of B. The literature contains statements of the sort ``our rule is complete and it heavily prunes the search space; therefore it is efficient". [This is a very general statement. We are not referring in particular to the paramodulation strategies on which we focus in this paper.] These positions are highly questionable and indicate that the authors have little or no experience with the practical use of automated inference systems. Restrictive rules (1) can block short, easy-to-find proofs, (2) can block proofs involving simple clauses, the type of clause on which many practical searches focus, (3) can require weakening of redundancy control such as subsumption and demodulation, and (4) can require the use of complex checks in deciding whether such rules should be applied. The only way to determine the practical value of inference rules and search strategies is to experiment on problems in which long-term target users are interested. In this paper we present a new theorem prover for equational logic, a set of 33 equational theorems for evaluating paramodulation strategies, a large set of experiments with several paramodulation strategies, and two equational proofs in Robbins algebra.

W. McCune and L. Wos, "Otter: The CADE-13 Competition Incarnations," Preprint ANL/MCS-P619-1096, November 1996.	This article discusses the two incarnations of Otter entered in the CADE-13 Automated Theorem Proving Competition. Also presented are some historical background, a summary of applications that have led to new results in mathematics and logic, and a general discussion of Otter.

H. G. Kaper and P. Takac, "Ginzburg-Landau Dynamics with a Time-dependent Magnetic Field," Preprint ANL/MCS-P620-1196, November 1996.	The time-dependent Ginzburg-Landau equations of superconductivity define a dynamical process when the applied magnetic field varies with time. Sufficient conditions (in terms of the time rate of change of the applied field) are given that, if satisfied, guarantee that this dynamical process is asymptotically autonomous. As time goes to infinity, the dynamical process asymptotically approaches a dynamical system whose attractor coincides with the omega-limit set of the dynamical process.

P. Eberhard and C. Bischof, "Automatic Differentiation of Numerical Integration Algorithms," Math. Comput. 68 (1999), pp. 717-731. Also Preprint ANL/MCS-P621-1196.	Automatic differentiation (AD) is a technique for automatically augmenting computer programs with statements for the computation of derivatives. This article discusses the application of automatic differentiation to numerical integration algorithms for ordinary differential equations (ODEs), in particular, the ramifications of the fact that AD is applied not only to the solution of such an algorithm, but to the solution procedure itself. This subtle issue can lead to surprising results when AD tools are applied to variable-stepsize, variable-order ODE integrators. The computation of the final time step plays a special role in determining the computed derivatives. We investigate these issues using various integrators and suggest constructive approaches for obtaining the desired derivatives.

D. Ralph and S. J. Wright, "Superlinear Convergence of an Interior-Point Method Despite Dependent Constraints," Preprint ANL/MCS-P622-1196, December 1996.	We show that an interior-point method for monotone variational inequalities exhibits superlinear convergence on condition that all the standard assumptions hold except for the well known assumption that the Jacobian of the active constraints has full rank at the solution. We show that superlinear convergence occurs even when the constant rank condition on the Jacobian assumed in our earlier paper does not hold, thereby extending the main result of that paper.

L. Wos, "Automating the Search for Elegant Proofs," J. of Automated Reasoning 21 (1998), pp. 135-175. Also Preprint ANL/MCS-P623-1196, June 1997.	The research reported in this article was spawned by a colleague's request to find an elegant proof (of a theorem from Boolean algebra) to replace his proof consisting of 816 deduced steps. The request was met by finding a proof consisting of 100 deduced steps. The methodology used to obtain the far shorter proof is presented in detail through a sequence of experiments. Although clearly not an algorithm, the methodology is sufficiently general to enable its use for seeking elegant proofs regardless of the domain of study. In addition to (usually) being more elegant, shorter proofs often provide the needed path to constructing a more efficient circuit, a more effective algorithm, and the like. The methodology relies heavily on the assistance of McCune's automated reasoning program OTTER. Of the aspects of proof elegance, the main focus here is on proof length, with brief attention paid to the type of term present, the number of variables required, and the complexity of deduced steps. The methodology is iterative, relying heavily on the use of three strategies: the resonance strategy, the hot list strategy, and McCune's ratio strategy. These strategies, as well as other features on which the methodology relies, do exhibit tendencies that can be exploited in the search for shorter proofs and for other objectives. To provide some insight regarding the value of the methodology, I discuss its successful application to other problems from Boolean algebra and to problems from lattice theory. Research suggestions and challenges are also offered.

J. N. Lyness and U. Ruede, "Cubature of Integrands Containing Derivatives," Preprint ANL/MCS-P625-1196, November 1996.	We present a new technique for the numerical integration over R, a square or triangle, of an integrand of the form $(\nabla u)^T B(\nabla v)$. This uses only function values of u, B, and v, avoiding explicit differentiation, but is suitable only when the integrand function is regular over R. The technique is analogous to Romberg integration, since it is based on using a sequence of very simple discretizations $J^{(m)}, m = 1,2,3,...$, of the required integral and applying extrapolation in m to provide closer approximations. A general approach to the problem of constructing discretizations is given. We provide specific cost-effective discretizations satisfying familiar, but somewhat arbitrary guidelines. As in Romberg integration, when each component function in the integrand is a polynomial, this technique leads to an exact result.

C. Bischof, L. Roh, and A. Mauer, "ADIC: An Extensible Automatic Differentiation Tool for ANSI-C," Software--Practice and Experience 27(12) (Dec. 1997), pp. 1427-1456. Also Preprint ANL/MCS-P626-1196, March 1997.	In scientific computing, we often require the derivatives {partial}f/{partial}x of a function f expressed as a program with respect to some input parameter(s), x, say. Automatic differentiation (AD) techniques augment the program with derivative computation by applying the chain rule of calculus to elementary operations in an automated fashion. Due to the associativity of the chain rule, there are many ways for accumulating these derivatives. This article introduces ADIC (Automatic Differentiation of C), a new AD tool for ANSI-C programs. ADIC is currently the only tool for ANSI-C that employs a source-to-source program transformation approach; that is, it takes a C code and produces a new C code that computes the original results as well as thee derivatives. We first present ADIC ``by example'' to illustrate the functionality and ease of use of ADIC and then describe in detail the architecture of ADIC. ADIC incorporates a modular design that provides a foundation for both rapid prototyping of better AD algorithms and their sharing across AD tools for different languages. A component architecture called AIF (AD Interface Form) separates core AD concepts from their language-specific implementation and allows the development of generic AD modules that can be directly reused in other AIF-based AD tools. The language-specific ADIC front-end and backend canonicalize C programs to make them fir for semantic augmentation and manage, for example, the association of a program variable with its associated derivative object. We also report on applications of ADIC to a semiconductor device simulator, 3-D CFD grid generator, vehicle simulator, and neural network code.

S.-J. Wu, Y.-J. J. Wu, and J. J. Shaw, "A Comprehensive SNA Arithmetic Calculator," Preprint ANL/MCS-P630-1296, December 1996.	Developing a generalized useful DNA computer has been one of the main interests since the inception of molecular computation. Toward that goal it is important to equip DNA with the ability to perform mathematical calculations. Here, a simple procedure that allows DNA-based bit arithmetic calculations is presented. Oligonucleotides encoding input bit numbers serve as primers for PCR to carry out the arithmetic operations. Amplified DNA fragments encoding output bit numbers are subsequently modified to assume the same format as the input. Thus, either iterative or parallel operations are permissible.

C. F. Ollivier-Gooch, "High-Order ENO Schemes for Unstructured Meshes Based on Least-Squares Reconstruction," Preprint ANL/MCS-P631-1296, February 1997.	High-order accurate schemes for conservation laws for unstructured meshes are not nearly so well advanced as such schemes for structured meshes. Consequently, little or nothing is known about the possible practical advantages of high-order discretization on unstructured meshes. This article is part of an ongoing effort to develop high-order schemes for unstructured meshes to the point where meaningful information can be obtained about the trade-offs involved in using spatial discretizations of higher than second-order accuracy on unstructured meshes.

S. A. Mitchell and S. A. Vavasis, "Quality Mesh Generation in Higher Dimensions," Preprint ANL/MCS-P633-1296, February 1997.	We consider the problem of triangulating a d-dimensional region. Our mesh generation algorithm, called QMG, is a quadtree-based algorithm that can triangulate any polyhedral region including nonconvex regions with holes. Furthermore, our algorithm guarantees a bounded aspect ratio triangulation provided that the input domain itself has no sharp angles. Finally, our algorithm is guaranteed never to overrefine the domain, in the sense that the number of simplices produced by QMG is bounded above by a factor times the number produced by any competing algorithm, where the factor depends on the aspect ratio bound satisfied by the competing algorithm. The QMG algorithm has been implemented in C++ and is used as a mesh generator for the finite element method.

1996 Abstracts of MCS Reports and Preprints

Reports

Preprints

Last updated on January 23, 2006

Disclaimer
Security/Privacy Notice

webmaster@mcs.anl.gov

1996 Abstracts of MCS Reports and Preprints

Reports

Preprints

Last updated on January 23, 2006

Disclaimer Security/Privacy Notice

webmaster@mcs.anl.gov

Disclaimer
Security/Privacy Notice