Scalable Hierarchical Particle Algorithms for Galaxy Formation and Accretion Astrophysics, Final Report

T-6 galaxy icon

Introduction

A full understanding of galaxy evolution can not be obtained with purely analytic techniques. The best tool available to investigate the consequences of particular cosmological theories is the parallel supercomputer. Recent advances in both hardware and software allow accurate numerical simulations of complex, nonlinear astrophysical phenomena to be carried out. Our simulations of large scale structure over the past several years have allowed us to perform controlled numerical experiments, evolving the universe from the Big Bang to the present. These N-body simulations represent the state-of-the-art in numerical cosmology and parallel N-body algorithms, consuming over 5 petaflops (5 times 10 to the 15th power floating point operations) to date.

Without such computer experiments the study of cosmological structure formation, the birth of galaxies and their interactions, quasars, supernovae, or the consequences of accretion of comets and asteroids onto planets within the solar system would be virtually impossible. These phenomena are all characterized by multiple spatial and temporal scales which must be accounted for in an effort to gain insight into the nature of the astrophysical processes which shape their evolution. In all of these situations gravity and hydrodynamics play important roles. In our project to tackle these problems, we have developed a software infrastructure based on parallel tree data structures which is capable of solving a large class of problems efficiently on parallel supercomputers.

While these simulations have answered several questions, there is still an almost embarassing richness of unexplored alternatives. Fortunately, all of the alternatives will soon confront even more precise observational data, and their success will be evaluated by means of larger and more accurate simulations.

Tree-based codes can solve a very general class of problems that can be expected to grow in importance as the need for spatial adaptivity becomes necessary for the simulation of ever more difficult problems. Problems of current interest in a wide variety of areas rely heavily on methods very similar to those we are using. We are directly familiar with applications in computational biology (protein folding, thermodynamics in aqueous solution), electromagnetic scattering, fluid mechanics (vortex method, panel method), molecular dynamics, materials science (dislocation dynamics, boundary element methods) and plasma physics, but there are certainly more.

Essential to the transition to parallel computing is the existence of adaptable codes which will parallelize a variety of applications so that users only need to specify the ``physics'' of a problem and not the details of data structures, load balancing, interprocessor communication, etc. Our work has demonstrated real progress toward that goal.

Approach

Two situations arise again and again in a variety of particle algorithms:

For example, the problem of finding neighbors within the cutoff radius of a Lennard-Jones potential in a molecular dynamics simulation is qualitatively the same as finding neighbors in an SPH simulation. Similarly, the Biot-Savart summations that appear in vortex dynamics simulations are essentially the same as the Newtonian interactions that occur in astrophysics. One simply has a ``vector mass'' to contend with that adds somewhat to the complexity, but little to the essential underlying algorithm. Treecodes offer efficient and parallel solutions to both these situations which transcend the individual problem domains.

Scientific accomplishments

Our simulations of the Cold Dark Matter model (CDM) have shown that the galaxy-galaxy velocity dispersion at small scales is consistent with results obtained from the CfA sky survey. This is significant, since velocity dispersion limits had been previously used to rule out the model. Similarly, we have shown that the redshift-space power spectrum (also used to rule out CDM) does not provide an unambiguous constraint at small scales, and is consistent with the power spectrum measured in the IRAS galaxy catalog.

High resolution simulations of Abell cluster formation were carried and analyzed. They show that a dense nugget from the core of a large majority of galactic-sized halos survive the encounter with the massive cluster core, even though an outer shell of each of these halos is stripped and accreted onto the common cluster envelope. The resulting cluster is far from virialised, as the density of halos is lower in the core of the cluster. The usual observational methods of computing cluster mass lead to significant underestimates.

We have used the SPH code running on the Intel Paragon to study the impact of comet Shoemaker-Levy 9 with Jupiter. This fully 3-dimensional calculation followed a spherical bolide 1 km in diameter during the approximately 3 second period during which the comet fragment deposits its kinetic energy into the Jovian atmosphere.

We have successfully merged the Kerr metric (which describes the gravitational field of a spinning black hole) into our SPH code and have performed benchmark hydrodynamic simulations of the tidal disruption of stars in such a curved spacetime geometry. This is an essential step toward the simulation of an accretion disk.

Computational Accomplishments

Software that is portable between different disciplines is an elusive but highly desirable commodity. It is virtually guaranteed that a project focused exclusively on a particular problem will not produce software that can easily be used outside that discipline. Appropriate abstractions do not emerge without careful analysis and design. We have specifically designed the software described here so that it can be used in a variety of areas. We use a single implementation of the underlying data structures to support all of these ``applications.'' These tasks are diverse enough to require a careful design of interfaces and libraries in such a way that the ``physics'' is cleanly separated from the ``data structures.'' We believe that the effort of designing a clean, highly modular implementation has not only saved us time (since we have been re-using our own software in separate sub-problems) but is also allowing us to leverage our work to speed the development of high-quality parallel software in a variety of unrelated fields.

Research at Syracuse University coupled the work at LANL and Caltech to investigations of possible language extensions to High Performance Fortran and C++, and extension of the hashed oct-tree method to grid generation and related problems.

Treecode performance
Site Machine Nodes Gflops
LANL TMC CM-5 512 14.06
Caltech Intel Paragon 512 13.70
NRL TMC CM-5E 256 11.57
Caltech Intel Delta 512 10.02
NAS IBM SP-2 128 9.52
JPL Cray T3D 256 7.94

Simulation image cube

Bibliography of Our Research Supported by the ESS program

Michael S. Warren, mswarren@lanl.gov