Selecting genes based on expressional variance
1
1
Entering edit mode
Jon Bråte ▴ 180
@jon-brate-6263
Last seen 3 months ago
Norway

Hi,

I am trying to follow the procedure outlined in Liao, Q. et al. NAR 2011 (Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network) and want to select the genes with "expressional variance ranked in the top 75 percentile of each data set". I have a matrix of count data that is variance stabilized using DESeq2 (recommended as input for the WGCNA package), but I am unsure how to proceed to select the genes with highest variance. My matrix consists of datasets representing different biological conditions and most of them are in triplicates.

Thanks,

Jon

geneexpression wgcna rnaseq • 3.0k views
3
Entering edit mode
@laurent-gatto-5645
Last seen 1 day ago
Belgium

Here is a suggestion that uses the rowVars from genefilter:

> set.seed(1)
> m <- matrix(rnorm(1000, 10), ncol = 10)
> dim(m)
[1] 100  10
> library("genefilter")
> rv <- rowVars(m)
> summary(rv)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.2754  0.7198  0.9987  1.0590  1.3720  2.7310
> (q75 <- quantile(rowVars(m), .75))
75%
1.372284
> m2 <- m[rv > q75, ]
> dim(m2)
[1] 25 10
> summary(rowVars(m2))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1.375   1.477   1.564   1.708   1.848   2.731


DESeq2 might have some functionality to do similar or more appropriate filtering, but I'll leave it to the experts.

0
Entering edit mode

Thanks, works perfect!