WGCNA following salmon/DESeq2
2
1
Entering edit mode
maya.kappil ▴ 10
@mayakappil-18569
Last seen 21 months ago

Hello!

If I wanted to conduct WGCNA analysis following a salmon/DESeq2 workflow, would it be appropriate to use the matrix generated after applying the vst function on the dds object? Something akin to the following script:

dds<- DESeqDataSetFromTximport(txi, coldata, design = ~ batch + Sex + BW)

keep <- rowSums(counts(dds)>=1) >= 30 #perform some prefiltering

dds <- dds[keep,]

dds <- DESeq(dds)

vsd <- vst(dds, blind = FALSE) #transform while accounting for design 

Thanks!

deseq2 wgcna salmon • 1.2k views
ADD COMMENT
4
Entering edit mode
@mikelove
Last seen 35 minutes ago
United States

Yes, that would be the appropriate way to provide scaled, transformed data to a downstream method. I prefer blind=FALSE as you have here because it reduces the amount of shrinkage. It doesn't use the design when applying the transformation, only when estimating the (global) trend of within-group dispersion.

ADD COMMENT
0
Entering edit mode

Thanks for the quick response!  

ADD REPLY
3
Entering edit mode
@peter-langfelder-4469
Last seen 8 months ago
United States

I'll second Michael's opinion, and that's also pretty much what I do, except I filter genes using a somewhat different condition. I require that a gene has a relatively high expression (e.g., 0.5 to 1 count per million reads, this translates to a counts in low tens for a typical data set with 30-50M reads per sample) in at least 1/4 of the samples (or whatever fraction is the smallest experimental group of the design). The rationale is that typical correlation analysis in WGCNA assumes (approximately) continuous data; using correlation on counts below say 5-10 which tend to be mostly zero can really lead to spurious results.

ADD COMMENT
0
Entering edit mode

Thanks!  Ah, ok - that makes sense regarding the filtering.  In the code line for the filtering step, the 30 does refer to the sample size of my smallest comparison group.  Counts in low tens for at least this number of samples makes sense, and we do have roughly 50M reads/sample, so I can adjust this part of the code to reflect about 1 cpm in at least 30 samples. 

ADD REPLY

Login before adding your answer.

Traffic: 473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6