Abstract
We show teraflop performance of the fully featured ab initio molecular dynamics code CPMD on an IBM pSeries 690 cluster. A mixed distributed-memory, coarse-grained parallel approach using the MPI library and shared-memory, fine-grained parallelism using OpenMP directives is used to optimally map the algorithms on the available hardware. The top performance achieved is approximate to 20% of the peak performance and an estimated parallel efficiency of approximate to 45% on 1024 processors for a system of 1000 atoms. The main limiting factor of parallel efficiency was found to be the latency of the interconnect.