Header

UZH-Logo

Maintenance Infos

Assessing statistical significance in multivariable genome wide association analysis


Buzdugan, Laura; Kalisch, Markus; Navarro, Arcadi; Schunk, Daniel; Fehr, Ernst; Bühlmann, Peter (2016). Assessing statistical significance in multivariable genome wide association analysis. Bioinformatics, 32(13):1990-2000.

Abstract

Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data is often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS.
Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields p-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the ”spuriously correlated” SNP merely happens to be correlated with the ”truly causal” SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the WTCCC (The Wellcome Trust Case Control Consortium, 2007). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies.

Abstract

Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data is often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS.
Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields p-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the ”spuriously correlated” SNP merely happens to be correlated with the ”truly causal” SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the WTCCC (The Wellcome Trust Case Control Consortium, 2007). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies.

Statistics

Citations

2 citations in Web of Science®
2 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

22 downloads since deposited on 16 Mar 2016
12 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Economics
Dewey Decimal Classification:330 Economics
Language:English
Date:2016
Deposited On:16 Mar 2016 13:38
Last Modified:08 Dec 2017 19:15
Publisher:Oxford University Press
ISSN:1367-4803
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1093/bioinformatics/btw128
Official URL:http://bioinformatics.oxfordjournals.org/content/early/2016/03/07/bioinformatics.btw128.abstract

Download

Download PDF  'Assessing statistical significance in multivariable genome wide association analysis'.
Preview
Content: Accepted Version
Filetype: PDF
Size: 363kB
View at publisher
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)
Download PDF  'Assessing statistical significance in multivariable genome wide association analysis'.
Preview
Content: Accepted Version
Language: English
Filetype: PDF (Supplementary material)
Size: 274kB
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)