selecting SNPs and samples in GENESIS
1
0
Entering edit mode
@stephanie-m-gogarten-5121
Last seen 4 months ago
University of Washington

This question was sent by email:

We performing GWAS on whole genome data and using GENSIS package for the association analysis.

Just a few small Questions:

  1. How many SNPs should be used for KING relationship matrix?
  2. What is the best way to select number of PCs to be used as covariates?
  3. How many SNPs should be used to estimate PCs using PC-Air and PC-Relate method?
  4. Does the association model take care of NA in the phenotype data or the samples need to be removed before performing association?
genesis • 901 views
ADD COMMENT
0
Entering edit mode
@stephanie-m-gogarten-5121
Last seen 4 months ago
University of Washington

The devel version of GENESIS includes a vignette that may be helpful: http://bioconductor.org/packages/devel/bioc/vignettes/GENESIS/inst/doc/assoc_test_seq.html

1) How many SNPs should be used for KING relationship matrix?

We recommend LD pruning to select SNPs. The SNPRelate function snpgdsLDpruning can be used for this. We usually set a minor allele frequency threshold in the pruning function to eliminate rare variants. After pruning, we usually end up with 200,000 - 300,000 SNPs.

2) What is the best way to select number of PCs to be used as covariates?

You want to select PCs that are informative for distinguishing populations. A good way to do this is make a parallel coordinates color-coded by population or self-identified race, as illustrated in the vignette. Look for the last PC that separates groups of colors instead of looking like noise.

3) How many SNPs should be used to estimate PCs using PC-Air and PC-Relate method?

The recommendations for LD pruning apply here also. We often do another round of LD pruning using only unrelated samples (selected with the pcairPartition function).

4) Does the association model take care of NA in the phenotype data or the samples need to be removed before performing association?

fitNullModel will remove any samples with NA in the phenotype data prior to fitting the null model. However, I recommend explictly selecting non-missing samples with the sample.id argument, because it makes it much easier to keep track of exactly how many samples are being used in your analysis and reduces the possibility of errors.

ADD COMMENT

Login before adding your answer.

Traffic: 798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6