Question: WGCNA following salmon/DESeq2
1
gravatar for maya.kappil
10 months ago by
maya.kappil10
maya.kappil10 wrote:

Hello!

If I wanted to conduct WGCNA analysis following a salmon/DESeq2 workflow, would it be appropriate to use the matrix generated after applying the vst function on the dds object? Something akin to the following script:

dds<- DESeqDataSetFromTximport(txi, coldata, design = ~ batch + Sex + BW)

keep <- rowSums(counts(dds)>=1) >= 30 #perform some prefiltering

dds <- dds[keep,]

dds <- DESeq(dds)

vsd <- vst(dds, blind = FALSE) #transform while accounting for design 

Thanks!

deseq2 wgcna salmon • 463 views
ADD COMMENTlink modified 10 months ago by Peter Langfelder2.2k • written 10 months ago by maya.kappil10
Answer: WGCNA following salmon/DESeq2
3
gravatar for Michael Love
10 months ago by
Michael Love25k
United States
Michael Love25k wrote:

Yes, that would be the appropriate way to provide scaled, transformed data to a downstream method. I prefer blind=FALSE as you have here because it reduces the amount of shrinkage. It doesn't use the design when applying the transformation, only when estimating the (global) trend of within-group dispersion.

ADD COMMENTlink written 10 months ago by Michael Love25k

Thanks for the quick response!  

ADD REPLYlink written 10 months ago by maya.kappil10
Answer: WGCNA following salmon/DESeq2
2
gravatar for Peter Langfelder
10 months ago by
United States
Peter Langfelder2.2k wrote:

I'll second Michael's opinion, and that's also pretty much what I do, except I filter genes using a somewhat different condition. I require that a gene has a relatively high expression (e.g., 0.5 to 1 count per million reads, this translates to a counts in low tens for a typical data set with 30-50M reads per sample) in at least 1/4 of the samples (or whatever fraction is the smallest experimental group of the design). The rationale is that typical correlation analysis in WGCNA assumes (approximately) continuous data; using correlation on counts below say 5-10 which tend to be mostly zero can really lead to spurious results.

ADD COMMENTlink written 10 months ago by Peter Langfelder2.2k

Thanks!  Ah, ok - that makes sense regarding the filtering.  In the code line for the filtering step, the 30 does refer to the sample size of my smallest comparison group.  Counts in low tens for at least this number of samples makes sense, and we do have roughly 50M reads/sample, so I can adjust this part of the code to reflect about 1 cpm in at least 30 samples. 

ADD REPLYlink written 10 months ago by maya.kappil10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 193 users visited in the last hour