Header

UZH-Logo

Maintenance Infos

Rethinking large-scale economic modeling for efficiency: optimizations for GPU and Xeon Phi clusters


Kübler, Felix; Mikushin, Dmitry; Scheidegger, Simon; Schenk, Olaf (2018). Rethinking large-scale economic modeling for efficiency: optimizations for GPU and Xeon Phi clusters. In: IPDPS 2018, Vancouver, BC, Canada, 21 May 2018 - 25 May 2018.

Abstract

We propose a massively parallelized and optimized framework to solve high-dimensional dynamic stochastic economic models on modern GPU- and MIC-based clusters. First, we introduce a novel approach for adaptive sparse grid index compression alongside a surplus matrix reordering, which significantly reduces the global memory throughput of the compute kernels and maps randomly accessed data onto cache or fast shared memory. Second, we fully vectorize the compute kernels for AVX, AVX2 and AVX512 CPUs, respectively. Third, we develop a hybrid cluster oriented work-preempting scheduler based on TBB, which evenly distributes the time iteration workload onto available CPU cores and accelerators. Numerical experiments on Cray XC40 KNL “Grand Tave” and on Cray XC50 “Piz Daint” systems at the Swiss National Supercomputer Centre (CSCS) show that our framework scales nicely to at least 4,096 compute nodes, resulting in an overall speedup of more than four orders of magnitude compared to a single, optimized CPU thread. As an economic application, we compute global solutions to an annually calibrated stochastic public finance model with sixteen discrete, stochastic states with unprecedented performance. Index Terms—High-Performance Computing, Macroeconomics, Public Finance, Adaptive Sparse Grids, Heterogeneous Systems, CUDA, GPU, MIC

Abstract

We propose a massively parallelized and optimized framework to solve high-dimensional dynamic stochastic economic models on modern GPU- and MIC-based clusters. First, we introduce a novel approach for adaptive sparse grid index compression alongside a surplus matrix reordering, which significantly reduces the global memory throughput of the compute kernels and maps randomly accessed data onto cache or fast shared memory. Second, we fully vectorize the compute kernels for AVX, AVX2 and AVX512 CPUs, respectively. Third, we develop a hybrid cluster oriented work-preempting scheduler based on TBB, which evenly distributes the time iteration workload onto available CPU cores and accelerators. Numerical experiments on Cray XC40 KNL “Grand Tave” and on Cray XC50 “Piz Daint” systems at the Swiss National Supercomputer Centre (CSCS) show that our framework scales nicely to at least 4,096 compute nodes, resulting in an overall speedup of more than four orders of magnitude compared to a single, optimized CPU thread. As an economic application, we compute global solutions to an annually calibrated stochastic public finance model with sixteen discrete, stochastic states with unprecedented performance. Index Terms—High-Performance Computing, Macroeconomics, Public Finance, Adaptive Sparse Grids, Heterogeneous Systems, CUDA, GPU, MIC

Statistics

Citations

Dimensions.ai Metrics
1 citation in Web of Science®
1 citation in Scopus®
Google Scholar™

Altmetrics

Downloads

35 downloads since deposited on 09 Mar 2018
35 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Banking and Finance
Dewey Decimal Classification:330 Economics
Language:English
Event End Date:25 May 2018
Deposited On:09 Mar 2018 09:03
Last Modified:17 Sep 2019 19:12
Publisher:Institute of Electrical and Electronics Engineers
Series Name:Proceedings - IEEE International Parallel and Distributed Processing Symposium
ISSN:1530-2075
Additional Information:© 2018 IEEE.
OA Status:Green
Publisher DOI:https://doi.org/10.1109/IPDPS.2018.00070
Related URLs:http://www.ipdps.org/ (Publisher)
https://ieeexplore.ieee.org/abstract/document/8425214/authors#authors (Publisher)
Other Identification Number:merlin-id:16026

Download

Download PDF  'Rethinking large-scale economic modeling for efficiency: optimizations for GPU and Xeon Phi clusters'.
Preview
Content: Accepted Version
Filetype: PDF
Size: 697kB
View at publisher