Skip to contents

This function simulate GWAS marginal regression coefficients and approximated standard error with with genotype \(\mathbf{X}\) and phenotype \(\mathbf{y}\) simulated under the standard linear model \(\mathbf{y} = \mathbf{X} \mathbf{b} + \mathbf{e}\). It supports two modes: (1) simulation from the individual level genotype \(\mathbf{X}\). (2) simulation from the moment matrix XtX: \(\mathbf{X}^T \mathbf{X}\).

In details, we want to obtain \(\widehat{b}_j=(\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}(\mathbf{X}_j^{\top}\mathbf{y})\) and \(\widehat{\text{s.e.}}(\widehat{b}_j) = \widehat{\sigma_j^2}(\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}\).

Variance explained by each SNP is typically assumed to be small. Therefore \(\widehat{\sigma_j^2}\) can be approximated with \(\widehat{\sigma_j^2}=\text{Var}[\mathbf{y}]\). We further consistently normalize \(\mathbf{y}\) such that \(\text{Var}[\mathbf{y}]=1\).

  • With the individual-level data \(\mathbf{X}\), we can calculate \(\widehat{b}_j\) and \(\widehat{\text{s.e.}}[\widehat{b}_j]\) with \((\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}\) and \((\mathbf{X}_j^{\top}\mathbf{y})\).

  • With the summary statistics data \(\mathbf{X}^T \mathbf{X}\), we have $$\widehat{b}_j= (\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}(\mathbf{X}_j^{\top}\mathbf{y})= (\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}[\mathbf{X}_j^{\top} (\mathbf{X}\mathbf{b}+\mathbf{e})] $$ Therefore, we can simulate \(\widehat{\mathbf{b}}\) by first sample from a multivariate normal with mean \(\mathbf{X}^\top \mathbf{X} \mathbf{b}\) and variance \(\mathbf{X}^\top \mathbf{X} \sigma_e^2\), and then multiplied by \((\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}\) to jth entry.

Usage

simulate_gwas(hsq, beta, XtX = NULL, n_indiv = NULL, X = NULL)

Arguments

hsq

Heritability explained by the genotype \(\frac{\text{Var}[X\beta]}{\text{Var}[y]}\)

beta

n_snp by n_sim simulated effect sizes

n_indiv

Number of individuals n_indiv for the LD matrix

X

A n_indiv by n_snp genotype matrix

ld

A n_snp by n_snp linkage disequilibrium matrix

Value

A list with the following elements:

beta_hat

Simulated marginal effects beta_hat[, i] corresponds to ith simulation

e

Simulated environmental noise