Abstract
This article gives a narrative overview of what constitutes climatological data and their typical features, with a focus on aspects relevant to statistical modeling. We restrict the discussion to univariate spatial fields and focus on maximum likelihood estimation. To address the problem of enormous datasets, we study three common approximation schemes: tapering, direct misspecification, and composite likelihood for Gaussian and non-Gaussian distributions. We focus particularly on the so-called ‘sinh-arcsinh distribution’, obtained through a specific transformation of the Gaussian distribution. Because it has flexible marginal distributions – possibly skewed and/or heavy-tailed – it has a wide range of applications. One appealing property of the transformation involved is the existence of an explicit inverse transformation that makes likelihood-based methods straightforward. We describe a simulation study illustrating the effects of the different approximation schemes. To the best of our knowledge, a direct comparison of tapering, direct misspecification, and composite likelihood has never been made previously, and we show that direct misspecification is inferior. In some metrics, composite likelihood has a minor advantage over tapering. We use the estimation approaches to model a high-resolution global climate change field. All simulation code is available as a Docker container and is thus fully reproducible. Additionally, the present article describes where and how to get various climate datasets.