CPMD code parallelization and tuning
The CPMD code has been implemented and tuned for the entire generation of IBM supercomputers. This made the code a reference in the world of high-perfomance simulations. Our recent dual-level (distributed memory/shared memory) implementation of CPMD is able to sustain 1 Teraflop on 32 clustered (via colony switches) p690 systems (1024 processors), with 45% parallel efficiency (1 to 1024 processors). These are still the best results for similarly complex codes.