Hello, I am performing WGCNA on RNA sequencing data for a cohort. We are struggling with the normalization method that will not interfere with the correlation with the traits.
1. Which would be the most appropriate method to normalize RNA seq for WGCNA? I have come across where vst and rlog are being used. However, there was little explanation for
2. Why DESeq2 median normalization or quantile normalization was not used. We also have 4 separate batches of sequencing for which I am using RemoveBatchEffect after normalization.
3. In case if vst/rlog, shall vst or rlog be performed on matrix or deseq2 object?
Following are the codes for multiple ways that are used. Could you please correct me if it's wrong and comment on the right one to choose?
METHOD 1: USING CPM (library:edgeR)
data_filt <- cpm(data_filt, log=TRUE)
data_adj1 <- removeBatchEffect(data_filt, Batch = coldata$Batch, covariates=NULL[,-1])
----------------------------------------------------------------------
METHOD 2: USING QUANTILE NORMALIZATION (library:preprocessCore)
QST<- preprocessCore::normalize.quantiles(data_filt, copy = TRUE, keep.names = TRUE)
rownames( QST) <- rownames(data_filt)
colnames( QST) <- colnames(data_filt)
data_adj <- removeBatchEffect(QST, Batch = coldata$Batch, covariates=NULL[,-1])
------------------------------------------------------------------------
METHOD 3: USING DESEQ2 (library:DESeq2)
dds <- DESeqDataSetFromMatrix(countData = data_filt, colData = coldata, design = ~1) #~1 because i have a cohort and not case control
dds <- DESeq(dds)
norm_count <- counts(dds, normalized = TRUE)
data_adj <- removeBatchEffect(norm_count, batch = coldata$Batch, covariates=NULL[,-1])
------------------------------------------------------------------------
METHOD 4: USING VST (library:DESeq2)
dds <- DESeqDataSetFromMatrix(countData = data_filt, colData = coldata, design = ~1)
dds <- DESeq(dds)
vsd <-varianceStabilizingTransformation(dds) #requires matrix or DESeq object
vsd<-getVarianceStabilizedData(vsd)
data_adj1 <- removeBatchEffect(vsd, batch = coldata$Batch, covariates=NULL[,-1])
-------------------------------------------------------------------------
METHOD 5: USING RLOG (library: DESeq2)
rlog <- rlogTransformation(data_filt, blind = FALSE)
data_adj<- removeBatchEffect(rlog, batch = coldata$Batch, covariates = NULL[,-1])
--------------------------------------------------------------------------
Your help will be really appreciated in this matter, Thank you, Sukeshini K
Thank you so much for your response. Can you please tell me what type of preprocessing should be expected for any gene correlation analysis in RNA sequencing data?
Read the WGCNA docs, please. Just repeating the question is basically ignoring what I just wrote.
I apologize if you felt like I had ignored the response. I am new to the technical terms and analysis. From above, what I understood is that rlog does log2, and vst also mostly performs log2 transformation. So these methods are more appropriate. Regarding the tutorial on WGCNA, it has worked with microarray datasets and has performed rma; if I quantile normalize and perform log transformation and then correct for batch effect, will that be appropriate? (I understand WGCNA is not a part of Bioconductor; only need to know if what I am performing isn't incorrect technically).
https://alexslemonade.github.io/refinebio-examples/04-advanced-topics/network-analysis_rnaseq_01_wgcna.html (suggests vst)
Thank you and apologies for bearing my silly questions, I truly appreciate your support, Sukeshini K