Simulate GWAS from either individual level genotype or LD matrix
Source:R/simulate.R
simulate_gwas.Rd
This function simulate GWAS marginal regression coefficients
and approximated standard error with with genotype \(\mathbf{X}\)
and phenotype \(\mathbf{y}\) simulated under the standard linear model
\(\mathbf{y} = \mathbf{X} \mathbf{b} + \mathbf{e}\). It supports
two modes: (1) simulation from the individual level genotype \(\mathbf{X}\). (2)
simulation from the moment matrix XtX
: \(\mathbf{X}^T \mathbf{X}\).
In details, we want to obtain \(\widehat{b}_j=(\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}(\mathbf{X}_j^{\top}\mathbf{y})\) and \(\widehat{\text{s.e.}}(\widehat{b}_j) = \widehat{\sigma_j^2}(\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}\).
Variance explained by each SNP is typically assumed to be small. Therefore \(\widehat{\sigma_j^2}\) can be approximated with \(\widehat{\sigma_j^2}=\text{Var}[\mathbf{y}]\). We further consistently normalize \(\mathbf{y}\) such that \(\text{Var}[\mathbf{y}]=1\).
With the individual-level data \(\mathbf{X}\), we can calculate \(\widehat{b}_j\) and \(\widehat{\text{s.e.}}[\widehat{b}_j]\) with \((\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}\) and \((\mathbf{X}_j^{\top}\mathbf{y})\).
With the summary statistics data \(\mathbf{X}^T \mathbf{X}\), we have $$\widehat{b}_j= (\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}(\mathbf{X}_j^{\top}\mathbf{y})= (\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}[\mathbf{X}_j^{\top} (\mathbf{X}\mathbf{b}+\mathbf{e})] $$ Therefore, we can simulate \(\widehat{\mathbf{b}}\) by first sample from a multivariate normal with mean \(\mathbf{X}^\top \mathbf{X} \mathbf{b}\) and variance \(\mathbf{X}^\top \mathbf{X} \sigma_e^2\), and then multiplied by \((\mathbf{X}_j^{\top}\mathbf{X}_j)^{-1}\) to jth entry.