CLoMAT
CLoMAT stands for "Conditional Logistic Model Association Tests".
This R package implements three rare-variant association tests for matched case-control
data under the conditional logistic regression (CLR) framework, namely CLR-Burden,
CLR-SKAT, and CLR-MiST, as well as a heuristic and fast matching algorithm.
CLoMAT provides a general solution to control for population stratification by matching cases and controls
based on their ancestry background. It is useful to empower genetic association studies in the
setting with a large number of common controls.
The CLoMAT R package and manual can be downloaded from
GitHub.
Citation for CLoMAT:
S Cheng*, J Lyu*, X Shi, K Wang, Z Wang, M Deng, B Sun, C Wang (2022).
Rare variant association tests for ancestry-matched case-control data based on conditional logistic regression.
Briefings in Bioinformatics, 23(2): bbab572.
[link]
GMMAT
GMMAT stands for "Generalized linear Mixed Model Association Test". This is an R package to perform association tests based
on generalized linear mixed models (i.e. modelling outcomes with the exponential family distributions). The package implemented
a series of algorithms to improve the computational speed so that it is efficient to perform genome-wide scan in large-scale
genetic studies (e.g. case-control disease studies). GMMAT is useful to control for family relatedness, population structure
and complex study design in genome-wide association studies.
Dr. Han Chen
is the leading developer of this R package.
The GMMAT R package and manual can be downloaded
here.
Citation for GMMAT:
H Chen*, C Wang*, MP Conomos, AM Stilp, Z Li, T Sofer, AA Szpiro, W Chen, JM Brehm, JC Celedon,
S Redline, GJ Papanicolaou, TA Thornton, CC Laurie, K Rice, X Lin (2016).
Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models.
American Journal of Human Genetics, 98: 653-666.
[link]
LASER
LASER stands for "Locating Ancestry from SEquence Reads". This package include two C++ programs,
laser and
trace, for
estimating individual ancestry in a reference ancestry space using either shortgun sequence reads (
laser) or genotype data (
trace).
Both programs were implemented under a unified framework based on principal components analysis (PCA) and projection Procrustes analysis.
Given a shared reference panel,
laser and
trace can place sequenced and genotyped samples into the same ancestry space.
LASER can also perform standard PCA on genotype data to explore population structure and to create the reference ancestry space.
Different options to compute PC scores and PC loadings have been implemented in the LASER program (version 2.01 or later).
The LASER program and a detailed manual can be downloaded
here.
Citation for LASER:
C Wang*, X Zhan*, J Bragg-Gresham, HM Kang, D Stambolian, E Chew, K Branham, J Heckenlively,
The FUSION Study, RS Fulton, RK Wilson, ER Mardis, X Lin, A Swaroop, S Zöllner, GR Abecasis (2014).
Ancestry estimation and control of population stratification for sequence-based association studies.
Nature Genetics, 46: 409-415.
[link]
C Wang, X Zhan, L Liang, GR Abecasis, X Lin (2015).
Improved ancestry estimation for both genotyping and sequencing data using projection Procrustes analysis and genotype imputation.
American Journal of Human Genetics, 96: 926-937.
[link]
LASER Server
This is a web server that provides a unified framework to estimate ancestry using either genotyping or sequencing data.
The server is based on the LASER algorithm (Wang et al. 2014 Nature Genetics, Wang et al. 2015 AJHG).
We provide a series of built-in ancestry reference panels on the server so that users do not need to prepare their own panels.
By using the same ancestry reference panel on the server, researchers can directly compare ancestry estimates across different studies.
We also provides interactive graphical visualization to faciliate quick exploration of the ancestry background of samples.
Please try our
LASER Server and have fun!
Citation for LASER Server:
D Taliun, S Chothani, S Schonherr, L Forer, M Boehnke, GR Abecasis, C Wang (2017).
LASER server: ancestry tracing with genotypes or sequence reads.
Bioinformatics, 33: 2056-2058.
[link]
MicroDrop
MicroDrop is a C++ program for estimating and correcting for allelic dropout in microsatellite data when replicated genotypes are
not available. Based on an allele frequency model, the program implements an expectation-maximization algorithm to search for
maximum-likelihood estimates of the allele frequencies, sample-specific and locus-specific dropout rates, and an inbreeding coefficient.
With the estimated parameter values, an empirical Bayesian strategy is used to prepare multiple imputed data sets to circumvent allelic
dropout in downstream data analyses.
The MicroDrop program and a detailed manual can be downloaded
here.
Citation for MicroDrop:
C Wang, KB Schroeder, NA Rosenberg (2012).
A maximum-likelihood method to correct for allelic dropout in microsatellite data with no replicate genotypes.
Genetics 192: 651-669.
[link]
SEEKIN
SEEKIN stands for "SEquence-based Estimation of KINship". This is a C++ program to estimate pairwise kinship coefficients for both
homogeneous samples and heterogeneous samples with population structure and admixture. The method was initially developed to analyze
sparse sequencing data, such as off-target data from targeted sequencing experiments, in which genotypes are uncertain.
But it can also be applied to high-quality genotyping data. The program is computationally efficient with multithreading feature
and takes standard VCF files as the input.
The SEEKIN software package is available on
GitHub.
Citation for SEEKIN:
J Dou*, B Sun*, X Sim, JD Hughes, DF Reilly, ES Tai, J Liu, C Wang (2017). Estimation of kinship coefficient in structured and
admixed populations using sparse sequencing data. PLOS Genetics, 13: e1007021.
[link]
WEScall
WEScall is a genotype calling pipeline for both whole-exome sequencing (WES) and whole-genome seqeuncing (WGS) data. It was designed to
utilize linkage disequilibrium (LD) information within the study sample and from an external WGS reference panel (such as the 1000 Genomes Project)
to improve genotype calling accuracy. For WES, the pipeline makes utilization of the shallow off-target seqeuncing data, allowing for
relatively accurate genotyping across non-coding regions, and thus improving downstream association analysis and polygenic risk prediction.
For more details, please see the reference listed below.
The WEScall software pipeline is available on
GitHub.
Citation for WEScall:
J Dou*, D Wu*, L Ding, K Wang, M Jiang, X Chai, DF Reilly, ES Tai, J Liu, X Sim, S Cheng, C Wang (2021).
Using off-target data from whole-exome sequencing to improve genotyping accuracy, association analysis, and polygenic risk prediction.
Briefings in Bioinformatics, 22(3): bbaa084.
[link]