Question

Running DESeq2 on top variable genes?

0

Entering edit mode

hs.lansdell ▴ 20

@hslansdell-14246

Last seen 8.2 years ago

Since looking at the row variance and DESeq2 both act as ranking mechanisms for genes, is there any sense to taking the top 1000 or 5000 genes with the highest variance across samples from an RNA sequenced set and running the DESeq2 pipeline on that subset to look for differential genes between groups (so simple design ~condition)?

Thanks!

deseq2 variance genes • 1.4k views

ADD COMMENT • link updated 8.2 years ago by Michael Love 43k • written 8.2 years ago by hs.lansdell ▴ 20

score 0 · Answer 1 · 2017-11-21

This will cause problems with DESeq2's dispersion prior, which should see counts from all the genes, or at least not subsetted by having a high or low sample variance (across all samples). Without getting into the details, it's not a problem when you subset by sample mean, but it would be a problem subsetting by sample variance.