PlayStation3 Gravity Grid
This section is dedicated to the ongoing research projects of our group that involve supercomputing in Physics.
The Sony PlayStation 3 has a number of unique features that make it particularly suited for scientific computation. First, the PS3 is an open platform, which essentially means that one can run a different system software on it, for example PowerPC Linux. Next, it has a revolutionary processor called the Cell processor which was developed by Sony, IBM and Toshiba. This processor has a main CPU (called the PPU) and several (six for the PS3) special compute engines (called SPUs) available for raw computation. Moreover, each SPU performs vector operations, which implies that they can compute on multiple data, in a single step. Finally, its incredibly low cost make it very attractive as a scientific computing node i.e. part of a cluster. In fact, its highly plausible that the raw computing power per dollar that the PS3 offers, is significantly higher than anything else on the market today!
Thanks to a very generous, partial donation by Sony, we have a sixteen PS3 cluster in our department, which we call PS3 Gravity Grid. Check out some pictures of the cluster here: 1) the PS3's arrive; 2) the rack arrives; 3) front view of the cluster; 4) side view of the cluster.
We are using "stock" PS3s for this cluster, with no hardware modifications. They are networked together using an inexpensive netgear gigabit switch. For Linux installation, there are several guides available on the internet. For YDL Linux, consider using the guide by Terrasoft Solutions. For Fedora 8, I found this guide particularly useful. For deploying a parallel job on this cluster, we use a code that implements a standard domain decomposition approach, based on message-passing (MPI). There are more details available on our code below. For compiling, we use GCC and also IBM's XL compilers for the Cell, that are available as part of IBM's Cell SDK. The MPI distribution that we are using is the recently released, OpenMPI distribution for PowerPC Linux.
--------------------
∞ I am incapable of conceiving infinity, and yet I do not accept finity. - Simone de Beauvoir -
|
* Binary Black Hole Coalescence using Perturbation Theory (GK)
This project broadly deals with estimating properties of the gravitational waves produced by the merger of two black holes. Gravitational waves are "ripples" in space-time that travel at the speed of light. These were theoretically predicted by Einstein's general relativity, but have never been directly observed. Currently, there is an extensive search being performed for these waves by the newly constructed NSF LIGO laboratory and various other such observatories in Europe and Asia. The ESA and NASA also have a mission planned in the near future - the LISA mission - that will also be attempting to detect these waves. To learn more about these waves and the recent attempts to observe them, please visit the LISA mission website.
The evolution code for the extreme-mass-ratio limit of this problem (referred to as EMRI) is essentially like an inhomogeneous wave-equation solver which includes a very complicated source-term. The source-term describes how the smaller black hole (or star) affects the space-time of the larger one. Because of the computational complexity of the source-term, it is often the most numerically intensive part of the whole evolution. On the PS3's Cell processor, it is precisely this part of the computation that is farmed out to six SPUs. This approach essentially eliminates the entire time spent on the source computation and yields a speed up of over a factor of five over a PPU-only computation. It should be noted that the context of this computation is double-precision floating point operations. In single-precision, the speed-up is significantly higher.
Overall, a single PS3 performs better than the highest-end desktops available and compares to as many as 25 nodes of an IBM Blue Gene supercomputer. And there is still tremendous scope left for extracting more performance through further optimization. More on that soon.
Furthermore, we distribute the entire computational domain across the sixteen PS3s using MPI (message passing) parallelization. This enables the entire cluster to run together in parallel, working on the computation in an efficient way. Each PS3 works on its part of the domain and communicates the appropriate data to the others, as needed.
* HPL - Standard supercomputer cluster benchmark (GK)
This project is about performing a standard LINPACK cluster benchmark on our sixteen (16) PS3 cluster. This is the benchmark that is used by the top500.org site that lists the most powerful supercomputers in the world. We worked with IBM to port their Cell blade benchmark code to our PS3 cluster. The results? The PS3 Gravity Grid generates a total performance of 40 GFLOPS (40 billion calculations per second). It should be noted that this benchmark was run in double-precision and because of the limited RAM on each PS3 we were only able to fit a matrix of size 10K on the entire cluster. The larger the problem size, the better the PS3/Cell's efficiency, therefore these testing conditions were far from optimal, unfortunately. We expect to be able to get much much better performance from the cluster if we had significantly more RAM available! Even with the 40 GFLOPS, our PS3 cluster is very competitive (in terms of performance-per-dollar) with the low-cost compute clusters out there. And if one could take advantage of doing some of the computation in single-precision, the performance would jump several fold -- likely touch one-half a TFLOPS (half a trillion calculations a second). The benchmark code with Cell specific patches is available here: HPL.
--------------------
∞ I am incapable of conceiving infinity, and yet I do not accept finity. - Simone de Beauvoir -
|