Question

Selecting genes based on expressional variance

1

Entering edit mode

Jon Bråte ▴ 250

@jon-brate-6263

Last seen 4 weeks ago

Norway

Hi,

I am trying to follow the procedure outlined in Liao, Q. et al. NAR 2011 (Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network) and want to select the genes with "expressional variance ranked in the top 75 percentile of each data set". I have a matrix of count data that is variance stabilized using DESeq2 (recommended as input for the WGCNA package), but I am unsure how to proceed to select the genes with highest variance. My matrix consists of datasets representing different biological conditions and most of them are in triplicates.

Thanks,

Jon

geneexpression wgcna rnaseq • 4.1k views

ADD COMMENT • link updated 9.8 years ago by Laurent Gatto 1.6k • written 9.8 years ago by Jon Bråte ▴ 250

score 3 · Answer 1 · 2014-09-25

Here is a suggestion that uses the rowVars from genefilter:

> set.seed(1)
> m <- matrix(rnorm(1000, 10), ncol = 10)
> dim(m)
[1] 100  10
> library("genefilter")
> rv <- rowVars(m)
> summary(rv)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2754  0.7198  0.9987  1.0590  1.3720  2.7310 
> (q75 <- quantile(rowVars(m), .75))
     75% 
1.372284 
> m2 <- m[rv > q75, ]
> dim(m2)
[1] 25 10
> summary(rowVars(m2))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.375   1.477   1.564   1.708   1.848   2.731

DESeq2 might have some functionality to do similar or more appropriate filtering, but I'll leave it to the experts.