RICOPILI: Rapid Imputation for COnsortias PIpeLIne

Lam M, Awasthi S, Watson HJ, Goldstein J, Panagiotaropoulou G, Trubetskoy V, Karlsson R, Frei O, Fan CC, De Witte W, Mota NR, Mullins N, Brügger K, Lee H, Wray N, Skarabis N, Huang H, Neale B, Daly M, Mattheissen M, Walters R, Ripke S. RICOPILI: Rapid Imputation for COnsortias PIpeLIne. Bioinformatics. 2019 Aug 8.pii: btz633. Epub ahead of print

Fast association tests for genes with FAST

Pritam Chanda, Hailiang Huang, Dan E Arking, and Joel S Bader. 2013. “Fast association tests for genes with FAST.” PLoS One, 8, 7, Pp. e68585. Abstract

UNLABELLED: Gene-based tests of association can increase the power of a genome-wide association study by aggregating multiple independent effects across a gene or locus into a single stronger signal. Recent gene-based tests have distinct approaches to selecting which variants to aggregate within a locus, modeling the effects of linkage disequilibrium, representing fractional allele counts from imputation, and managing permutation tests for p-values. Implementing these tests in a single, efficient framework has great practical value. Fast ASsociation Tests (Fast) addresses this need by implementing leading gene-based association tests together with conventional SNP-based univariate tests and providing a consolidated, easily interpreted report. Fast scales readily to genome-wide SNP data with millions of SNPs and tens of thousands of individuals, provides implementations that are orders of magnitude faster than original literature reports, and provides a unified framework for performing several gene based association tests concurrently and efficiently on the same data. AVAILABILITY: https://bitbucket.org/baderlab/fast/downloads/FAST.tar.gz, with documentation at https://bitbucket.org/baderlab/fast/wiki/Home.

BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters

Hailiang Huang, Sandeep Tata, and Robert J Prill. 2013. “BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters.” Bioinformatics, 29, 1, Pp. 135-6. Abstract

SUMMARY: Computational workloads for genome-wide association studies (GWAS) are growing in scale and complexity outpacing the capabilities of single-threaded software designed for personal computers. The BlueSNP R package implements GWAS statistical tests in the R programming language and executes the calculations across computer clusters configured with Apache Hadoop, a de facto standard framework for distributed data processing using the MapReduce formalism. BlueSNP makes computationally intensive analyses, such as estimating empirical p-values via data permutation, and searching for expression quantitative trait loci over thousands of genes, feasible for large genotype-phenotype datasets. AVAILABILITY AND IMPLEMENTATION: http://github.com/ibm-bioinformatics/bluesnp

Gene-based tests of association

Hailiang Huang, Pritam Chanda, Alvaro Alonso, Joel S Bader, and Dan E Arking. 2011. “Gene-based tests of association.” PLoS Genet, 7, 7, Pp. e1002177. Abstract

Genome-wide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel Gene-Wide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal. Permutation tests provide p-values that correct for the number of independent tests genome-wide and within each genetic locus. When applied to a dataset comprising 2.5 million SNPs in up to 8,000 individuals measured for various electrocardiography (ECG) parameters, this method identifies more validated associations than conventional GWAS approaches. The method also provides, for the first time, systematic assessments of the number of independent effects within a gene and the fraction of disease-associated genes housing multiple independent effects, observed at 35%-50% of loci in our study. This method can be generalized to other study designs, retains power for low-frequency alleles, and provides gene-based p-values that are directly compatible for pathway-based meta-analysis.