BACKGROUND: Misunderstanding of significance tests and P values is widespread in clinical research and elsewhere.

PURPOSE: To assess the implications of two common mistakes in the interpretation of statistical significance tests. The first one is the misinterpretation of the type I error rate as the expected proportion of false-positive results among all those called significant, also known as the false-positive report probability (FPRP). The second is the misinterpretation of a P value as (posterior) probability of the null hypothesis.

METHODS: A reverse-Bayes approach is used to calculate a lower bound on the proportion of truly effective treatments that would ensure the FPRP to be equal or below the type I error rate. A reverse-Bayes approach using minimum Bayes factors (BFs) yields upper bounds on the prior probability of the null hypothesis that would justify the interpretation of the P value as the posterior probability of the null hypothesis.

RESULTS: In a typical clinical trials setting, more than 50% of the treatments need to be truly effective to justify equality of the type I error rate and the FPRP. To interpret the P value as posterior probability, the difference between the corresponding prior probability and the P value cannot exceed 12.4 percentage points.

LIMITATIONS: The first analysis requires that the (one-sided) type I error rate is smaller than the type II error rate. The second result is valid under different scenarios describing how to transform P values to minimum BFs.

CONCLUSIONS: The two misinterpretations imply strong and often unrealistic assumptions on the prior proportion or probability of truly effective treatments.

Held, Leonhard (2013). *Reverse-Bayes analysis of two common misinterpretations of significance tests.* Clinical Trials, 10(2):236-242.

## Abstract

BACKGROUND: Misunderstanding of significance tests and P values is widespread in clinical research and elsewhere.

PURPOSE: To assess the implications of two common mistakes in the interpretation of statistical significance tests. The first one is the misinterpretation of the type I error rate as the expected proportion of false-positive results among all those called significant, also known as the false-positive report probability (FPRP). The second is the misinterpretation of a P value as (posterior) probability of the null hypothesis.

METHODS: A reverse-Bayes approach is used to calculate a lower bound on the proportion of truly effective treatments that would ensure the FPRP to be equal or below the type I error rate. A reverse-Bayes approach using minimum Bayes factors (BFs) yields upper bounds on the prior probability of the null hypothesis that would justify the interpretation of the P value as the posterior probability of the null hypothesis.

RESULTS: In a typical clinical trials setting, more than 50% of the treatments need to be truly effective to justify equality of the type I error rate and the FPRP. To interpret the P value as posterior probability, the difference between the corresponding prior probability and the P value cannot exceed 12.4 percentage points.

LIMITATIONS: The first analysis requires that the (one-sided) type I error rate is smaller than the type II error rate. The second result is valid under different scenarios describing how to transform P values to minimum BFs.

CONCLUSIONS: The two misinterpretations imply strong and often unrealistic assumptions on the prior proportion or probability of truly effective treatments.

## Citations

## Altmetrics

## Additional indexing

Item Type: | Journal Article, refereed, original work |
---|---|

Communities & Collections: | 04 Faculty of Medicine > Epidemiology, Biostatistics and Prevention Institute (EBPI) |

Dewey Decimal Classification: | 610 Medicine & health |

Language: | English |

Date: | 2013 |

Deposited On: | 12 Dec 2013 15:42 |

Last Modified: | 05 Apr 2016 17:14 |

Publisher: | SAGE Publications |

ISSN: | 1740-7745 |

Publisher DOI: | https://doi.org/10.1177/1740774512468807 |

PubMed ID: | 23329516 |

## Download

Full text not available from this repository.View at publisher

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.

You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.