Search
Question: WGCNA following salmon/DESeq2
0
gravatar for maya.kappil
14 days ago by
maya.kappil0 wrote:

Hello!

If I wanted to conduct WGCNA analysis following a salmon/DESeq2 workflow, would it be appropriate to use the matrix generated after applying the vst function on the dds object? Something akin to the following script:

dds<- DESeqDataSetFromTximport(txi, coldata, design = ~ batch + Sex + BW)

keep <- rowSums(counts(dds)>=1) >= 30 #perform some prefiltering

dds <- dds[keep,]

dds <- DESeq(dds)

vsd <- vst(dds, blind = FALSE) #transform while accounting for design 

Thanks!

ADD COMMENTlink modified 14 days ago by Peter Langfelder1.6k • written 14 days ago by maya.kappil0
2
gravatar for Michael Love
14 days ago by
Michael Love20k
United States
Michael Love20k wrote:

Yes, that would be the appropriate way to provide scaled, transformed data to a downstream method. I prefer blind=FALSE as you have here because it reduces the amount of shrinkage. It doesn't use the design when applying the transformation, only when estimating the (global) trend of within-group dispersion.

ADD COMMENTlink written 14 days ago by Michael Love20k

Thanks for the quick response!  

ADD REPLYlink written 14 days ago by maya.kappil0
1
gravatar for Peter Langfelder
14 days ago by
United States
Peter Langfelder1.6k wrote:

I'll second Michael's opinion, and that's also pretty much what I do, except I filter genes using a somewhat different condition. I require that a gene has a relatively high expression (e.g., 0.5 to 1 count per million reads, this translates to a counts in low tens for a typical data set with 30-50M reads per sample) in at least 1/4 of the samples (or whatever fraction is the smallest experimental group of the design). The rationale is that typical correlation analysis in WGCNA assumes (approximately) continuous data; using correlation on counts below say 5-10 which tend to be mostly zero can really lead to spurious results.

ADD COMMENTlink written 14 days ago by Peter Langfelder1.6k

Thanks!  Ah, ok - that makes sense regarding the filtering.  In the code line for the filtering step, the 30 does refer to the sample size of my smallest comparison group.  Counts in low tens for at least this number of samples makes sense, and we do have roughly 50M reads/sample, so I can adjust this part of the code to reflect about 1 cpm in at least 30 samples. 

ADD REPLYlink written 14 days ago by maya.kappil0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 234 users visited in the last hour