Question: selecting SNPs and samples in GENESIS
0
13 months ago by
University of Washington
Stephanie M. Gogarten680 wrote:

This question was sent by email:

We performing GWAS on whole genome data and using GENSIS package for the association analysis.

Just a few small Questions:

1. How many SNPs should be used for KING relationship matrix?
2. What is the best way to select number of PCs to be used as covariates?
3. How many SNPs should be used to estimate PCs using PC-Air and PC-Relate method?
4. Does the association model take care of NA in the phenotype data or the samples need to be removed before performing association?
genesis • 165 views
modified 13 months ago • written 13 months ago by Stephanie M. Gogarten680
Answer: selecting SNPs and samples in GENESIS
0
13 months ago by
University of Washington
Stephanie M. Gogarten680 wrote:

The devel version of GENESIS includes a vignette that may be helpful: http://bioconductor.org/packages/devel/bioc/vignettes/GENESIS/inst/doc/assoc_test_seq.html

1) How many SNPs should be used for KING relationship matrix?

We recommend LD pruning to select SNPs. The SNPRelate function snpgdsLDpruning can be used for this. We usually set a minor allele frequency threshold in the pruning function to eliminate rare variants. After pruning, we usually end up with 200,000 - 300,000 SNPs.

2) What is the best way to select number of PCs to be used as covariates?

You want to select PCs that are informative for distinguishing populations. A good way to do this is make a parallel coordinates color-coded by population or self-identified race, as illustrated in the vignette. Look for the last PC that separates groups of colors instead of looking like noise.

3) How many SNPs should be used to estimate PCs using PC-Air and PC-Relate method?

The recommendations for LD pruning apply here also. We often do another round of LD pruning using only unrelated samples (selected with the pcairPartition function).

4) Does the association model take care of NA in the phenotype data or the samples need to be removed before performing association?

fitNullModel will remove any samples with NA in the phenotype data prior to fitting the null model. However, I recommend explictly selecting non-missing samples with the sample.id argument, because it makes it much easier to keep track of exactly how many samples are being used in your analysis and reduces the possibility of errors.