Recent coordinated efforts, in which numerous general circulation climate models have been run for a common set of experiments, have produced large datasets of projections of future climate for various scenarios. Those multimodel ensembles sample initial conditions, parameters, and structural uncertainties in the model design, and they have prompted a variety of approaches to quantifying uncertainty in future climate change. International climate change assessments also rely heavily on these models. These assessments often provide equal-weighted averages as best-guess results, assuming that individual model biases will at least partly cancel and that a model average prediction is more likely to be correct than a prediction from a single model based on the result that a multimodel average of present-day climate generally outperforms any individual model. This study outlines the motivation for using multimodel ensembles and discusses various challenges in interpreting them. Among these challenges are that the number of models in these ensembles is usually small, their distribution in the model or parameter space is unclear, and that extreme behavior is often not sampled. Model skill in simulating present-day climate conditions is shown to relate only weakly to the magnitude of predicted change. It is thus unclear by how much the confidence in future projections should increase based on improvements in simulating present-day conditions, a reduction of intermodel spread, or a larger number of models. Averaging model output may further lead to a loss of signal—for example, for precipitation change where the predicted changes are spatially heterogeneous, such that the true expected change is very likely to be larger than suggested by a model average. Last, there is little agreement on metrics to separate “good” and “bad” models, and there is concern that model development, evaluation, and posterior weighting or ranking are all using the same datasets. While the multimodel average appears to still be useful in some situations, these results show that more quantitative methods to evaluate model performance are critical to maximize the value of climate change projections from global models.