The editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: WGCNA following salmon/DESeq2
gravatar for maya.kappil
12 weeks ago by
maya.kappil0 wrote:


If I wanted to conduct WGCNA analysis following a salmon/DESeq2 workflow, would it be appropriate to use the matrix generated after applying the vst function on the dds object? Something akin to the following script:

dds<- DESeqDataSetFromTximport(txi, coldata, design = ~ batch + Sex + BW)

keep <- rowSums(counts(dds)>=1) >= 30 #perform some prefiltering

dds <- dds[keep,]

dds <- DESeq(dds)

vsd <- vst(dds, blind = FALSE) #transform while accounting for design 


deseq2 wgcna salmon • 135 views
ADD COMMENTlink modified 12 weeks ago by Peter Langfelder1.7k • written 12 weeks ago by maya.kappil0
Answer: WGCNA following salmon/DESeq2
gravatar for Michael Love
12 weeks ago by
Michael Love22k
United States
Michael Love22k wrote:

Yes, that would be the appropriate way to provide scaled, transformed data to a downstream method. I prefer blind=FALSE as you have here because it reduces the amount of shrinkage. It doesn't use the design when applying the transformation, only when estimating the (global) trend of within-group dispersion.

ADD COMMENTlink written 12 weeks ago by Michael Love22k

Thanks for the quick response!  

ADD REPLYlink written 12 weeks ago by maya.kappil0
Answer: WGCNA following salmon/DESeq2
gravatar for Peter Langfelder
12 weeks ago by
United States
Peter Langfelder1.7k wrote:

I'll second Michael's opinion, and that's also pretty much what I do, except I filter genes using a somewhat different condition. I require that a gene has a relatively high expression (e.g., 0.5 to 1 count per million reads, this translates to a counts in low tens for a typical data set with 30-50M reads per sample) in at least 1/4 of the samples (or whatever fraction is the smallest experimental group of the design). The rationale is that typical correlation analysis in WGCNA assumes (approximately) continuous data; using correlation on counts below say 5-10 which tend to be mostly zero can really lead to spurious results.

ADD COMMENTlink written 12 weeks ago by Peter Langfelder1.7k

Thanks!  Ah, ok - that makes sense regarding the filtering.  In the code line for the filtering step, the 30 does refer to the sample size of my smallest comparison group.  Counts in low tens for at least this number of samples makes sense, and we do have roughly 50M reads/sample, so I can adjust this part of the code to reflect about 1 cpm in at least 30 samples. 

ADD REPLYlink written 11 weeks ago by maya.kappil0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 387 users visited in the last hour