IBM Research – Zurich has developed over the years a deep and broad competence in the field of parallel and distributed computing. Our internal work on scientific and technical computing is used as a test bed and driver for innovative parallel alghorithms and implementations on diverse computer hardware ranging from general-use high-end supercomputers (p-Series clusters) and Linux clusters (x-Series) to novel, massively parallel computer architectures such as BlueGene (L, P and Q) and heterogeneous architectures.

Our key competencies are

- distributed memory parallelization using MPI,
- shared memory parallelization using OpenMP and direct threading,
- mixed MPI/SMP schemes
- parallel programming

## CPMD code parallelization and tuning

The CPMD code has been implemented and tuned for the entire generation of IBM supercomputers. This made the code a reference in the world of high-perfomance simulations.

We have recently demonstrated that the extreme threading capability of the Blue Gene/Q Supercomputer in combination with an efficient parallelization of CPMD can render density functional theory, including hybrid exchange functionals, routine in molecular modeling activities.

We demonstrated scalability up to 1,048,576 threads with a parallel efficiency of 99% for the most intensive computational part, i.e. for the Hartree–Fock exact exchange and of 83% for the overall computational flow for runs on exactly the same models used for the scientific investigation, with a sustained performance of ∼0.5 Pflops.

For further details, see CPMD performance and scale out.

## Linear scaling semi-empirical (QM) molecular dynamics (SEMD)

Applications of quantum Hamiltonians to biological systems is limited by the cost of performing long calculations on large systems (>30 K atoms). Whereas classical force fields and QM/MM are good for conformational changes and localized reactions, there is a major need to develop scalable algorithms that allow the application of quantum Hamiltonians (semi-empirical or first-principles) to biological systems for large-scale ion motion and large-scale electron transfer.

We have developed a new computer program, called SEMD, that allows us to perform a molecular dynamics step in just a few seconds, regardless of system size, provided that we have proportional resources. The core of the software is a novel sparse matrix–matrix multiplication kernel with its enhanced version. The same kernel is also used for knowledge graph operations.

## Graph analysis

Numerical linear algebra is a one of the most basic and fundamental kernels in large-scale computational science applications. Indeed, whether we are simulating earth's climate, analyzing the strength of our bones or computing the uncertainty in data at scale, the implementation of computer models for these problems boils down to efficient numerical linear algebra kernels.

We have been working extensively in developing new, low-cost and highly scalable solvers for large, dense and sparse linear systems as well as developing highly robust eigensolvers for computing many eigenvalues deep in the interior of the spectrum. In addition, we have developed hybrids of stochastic and deterministic algorithms that hold great promise in reducing costs by several orders of magnitude.

## Data uncertainty quantification

We live in the age of data. A constant deluge of new information is changing the way we do business, science and engineering. However, data needs first to be analyzed for uncertainty before any analysis can be performed.

We have developed techniques that reduce the cubic cost of analyzing inverse covariance matrices down to quadratic levels. Our methodologies scale all the way from personal desktops to massively parallel supercomputers. Thus, GBytes can be analyzed in seconds using a laptop and hundreds of TBytes can be analyzed using HPC platforms in a few minutes.