Abstract
Cross-sectional covariate-related reference ranges are widely used in clinical medicine to put individual observations in the context of population values. Usually, such reference ranges are created from data sets of independent observations. If multiple measurements per individual are available, then ignoring the within-person correlation between repeats will lead to overestimation of centile precision. Furthermore, if abnormal measurements have triggered more frequent assessment, the data set will be biased thus producing biased centiles. Where multiple measures per individual exist, the methods commonly used are either randomly or systematically to select one observation per individual or to model individual trajectories and combine these. The first of these approaches may result in discarding a large proportion of the available data and may itself cause bias and the latter requires the form of the changes within individuals to be characterized. We have developed an approach to the modeling of the median, spread, and skew across individuals using maximum likelihood, which can incorporate correlations between dependent observations. Heavily biased data sets are simulated to illustrate how the methodology can eliminate the biases inherent in the data collection process and produce valid centiles plus estimates of the within-person correlations. The "select one per individual" approach is shown to be liable to bias and to produce less precise centiles. We recommend that the maximum likelihood method incorporating correlations be used with existing data sets. Furthermore, this is a potentially more efficient approach to be considered when planning the future collection of data solely for the purposes of creating cross-sectional covariate-related reference ranges.