Hi All,
I would like to do differential gene expression analysis and WGCNA usinf paired-end RNA seq data. However,I confused a lot related to using same pre-processing for both DGE & WGCNA analysis.
In summary, I have 2 experimental condition and each condition has 18 samples. However, they are sequenced in 3 batches. 1 st batch 6 pair (disease & control), 2nd batch 6 pair (disease & control), 3rd batch 6 pair (disease & control). Each batch done by different people over different times so I believe we have batch effect. As I am using DESeq2 for my analysis I set my model design as “Batch+Type”. I was normally using removeBatchEffect from limma package and plotting PCA & hierarchical clustering plots using spermann correlation to see which samples are outliers.
I have asked before if I should use rlog() for differential gene expression analysis and getVarianceStabilizedData() for WGCNA or can I use same normalization for both. I got a feedback that it is better to use voom() the removedbatcheffect() as they belong to the same package.Do you agree with this suggestion ?
Now, I am using voom() & removedbatcheffect() for WGCNA analysis but what do you suggest me to do for DGE ? Should I use limma package or DESeq2 package ( rld() or vsd() ) ?
In addition to these, I would like to take your advice about if I should remove any patients from analysis according to these hierarchical clusters:
Same data, different transformation methods( rld, vsd, voom), removed batch effect and plot hierarchical clustering using Spearmann correlation.


I really appreciate if you give me some insight,
Thanks in advance



Hi Michael,
I guess I explained myself wrong. As you said, I am using the DESeq() on raw counts then using results() on the "dds" variable to do DGE with defining the contrasts & alpha. The aim of rld or vsd transformation is for EDA and to see how samples are clustering via PCA, hierarchical clustering or MDS plots. I actually do not use removebatcheffect for DGE but for the preprocessing the data for WGCNA.
ddsMat<- DESeqDataSetFromMatrix(seMat, sample.table, design=~Batch+Type) dds<-DESeq(ddsMat) dds <- dds[ rowSums(counts(dds)) > 1, ] #DGE res<-results(dds,alpha=.05, contrast=c("Type", "Sphere", "Tumor")) res$symbol <- mapIds(org.Hs.eg.db,keys=row.names(res),column="SYMBOL", keytype="ENSEMBL", multiVals="first") resSig <- subset(res, padj < 0.05) #Transformations rld <- rlog(dds, blind= FALSE) rlogMat <- assay(rld) vsd = getVarianceStabilizedData(dds) #removeBatchEffect design=model.matrix(~sample.table$Type) removedbatch.rld=removeBatchEffect(rlogMat, batch=sample.table$Batch, design= design ) vsd.removed=removeBatchEffect(vsd, batch=sample.table$Batch, design= design ) #PCA par(mfrow = c(1,2)); cex1 = 1.5; plotPCA(rld, intgroup=c("Type", "Batch")) plotPCA(rld.removed, intgroup=c("Type", "Batch")) dev.off()So as you suggested I will not be removing any samples for DGE, but what do you suggest for WGCNA? Because in their tutorial they are removing the sample that is a outlier in Figure1. https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-01-dataInput.pdf
Your answer actually cleared the clouds. There is absolutely nothing wrong with using these functions across packages.
Thanks a lot for your answer