Selecting genes based on expressional variance
1
1
Entering edit mode
Jon Bråte ▴ 250
@jon-brate-6263
Last seen 2.6 years ago
Norway

Hi,

I am trying to follow the procedure outlined in Liao, Q. et al. NAR 2011 (Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network) and want to select the genes with "expressional variance ranked in the top 75 percentile of each data set". I have a matrix of count data that is variance stabilized using DESeq2 (recommended as input for the WGCNA package), but I am unsure how to proceed to select the genes with highest variance. My matrix consists of datasets representing different biological conditions and most of them are in triplicates.

Thanks,

Jon

geneexpression wgcna rnaseq • 4.0k views
ADD COMMENT
3
Entering edit mode
@laurent-gatto-5645
Last seen 2 days ago
Belgium

Here is a suggestion that uses the rowVars from genefilter:

> set.seed(1)
> m <- matrix(rnorm(1000, 10), ncol = 10)
> dim(m)
[1] 100  10
> library("genefilter")
> rv <- rowVars(m)
> summary(rv)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2754  0.7198  0.9987  1.0590  1.3720  2.7310 
> (q75 <- quantile(rowVars(m), .75))
     75% 
1.372284 
> m2 <- m[rv > q75, ]
> dim(m2)
[1] 25 10
> summary(rowVars(m2))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.375   1.477   1.564   1.708   1.848   2.731 

DESeq2 might have some functionality to do similar or more appropriate filtering, but I'll leave it to the experts.

ADD COMMENT
0
Entering edit mode

Thanks, works perfect!

ADD REPLY

Login before adding your answer.

Traffic: 705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6