UZH-Logo

Maintenance Infos

On ERI Sorting for SIMD Execution of Large-Scale Hartree-Fock SCF


Ramdas, T; Egan, G; Abramson, D; Baldridge, K K (2008). On ERI Sorting for SIMD Execution of Large-Scale Hartree-Fock SCF. Computer Physics Communications, 178(11):817-834.

Abstract

Given the resurgent attractiveness of single-instruction-multiple-data (SIMD) processing, it is important for high-performance computing applications
to be SIMD-capable. The Hartree–Fock SCF (HF-SCF) application, in it’s canonical form, cannot fully exploit SIMD processing. Prior
attempts to implement Electron Repulsion Integral (ERI) sorting functionality to essentially “SIMD-ify” the HF-SCF application have met frustration
because of the low throughput of the sorting functionality. With greater awareness of computer architecture, we discuss how the sorting
functionality may be practically implemented to provide high-performance. Overall system performance analysis, including memory locality
analysis, is also conducted, and further emphasises that a system with ERI sorting is capable of very high throughput. We discuss two alternative
implementation options, with one immediately accessible software-based option discussed in detail. The impact of workload characteristics on
expected performance is also discussed, and it is found that in general as basis set size increases the potential performance of the system also
increases. Consideration is given to conventional CPUs, GPUs, FPGAs, and the Cell Broadband Engine architecture.
© 2008 Elsevier B.V. All rights reserved.

Given the resurgent attractiveness of single-instruction-multiple-data (SIMD) processing, it is important for high-performance computing applications
to be SIMD-capable. The Hartree–Fock SCF (HF-SCF) application, in it’s canonical form, cannot fully exploit SIMD processing. Prior
attempts to implement Electron Repulsion Integral (ERI) sorting functionality to essentially “SIMD-ify” the HF-SCF application have met frustration
because of the low throughput of the sorting functionality. With greater awareness of computer architecture, we discuss how the sorting
functionality may be practically implemented to provide high-performance. Overall system performance analysis, including memory locality
analysis, is also conducted, and further emphasises that a system with ERI sorting is capable of very high throughput. We discuss two alternative
implementation options, with one immediately accessible software-based option discussed in detail. The impact of workload characteristics on
expected performance is also discussed, and it is found that in general as basis set size increases the potential performance of the system also
increases. Consideration is given to conventional CPUs, GPUs, FPGAs, and the Cell Broadband Engine architecture.
© 2008 Elsevier B.V. All rights reserved.

Citations

4 citations in Web of Science®
5 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

2 downloads since deposited on 07 Jan 2009
0 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Department of Chemistry
Dewey Decimal Classification:540 Chemistry
Language:English
Date:1 June 2008
Deposited On:07 Jan 2009 12:33
Last Modified:05 Apr 2016 12:46
Publisher:Elsevier
ISSN:0010-4655
Publisher DOI:10.1016/j.cpc.2008.01.045
Permanent URL: http://doi.org/10.5167/uzh-9189

Download

[img]
Filetype: PDF - Registered users only
Size: 496kB
View at publisher

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations