1. Which would be the most appropriate method to normalize RNA seq for WGCNA? I have come across where vst and rlog are being used. However, there was little explanation for

Question

What should be the normalization protocol for RNA seq data for WGCNA?

2

Entering edit mode

sukeshinik5 ▴ 20

@8cebf978

Last seen 9 weeks ago

India

Hello, I am performing WGCNA on RNA sequencing data for a cohort. We are struggling with the normalization method that will not interfere with the correlation with the traits.

1. Which would be the most appropriate method to normalize RNA seq for WGCNA? I have come across where vst and rlog are being used. However, there was little explanation for

2. Why DESeq2 median normalization or quantile normalization was not used. We also have 4 separate batches of sequencing for which I am using RemoveBatchEffect after normalization.

3. In case if vst/rlog, shall vst or rlog be performed on matrix or deseq2 object?

Following are the codes for multiple ways that are used. Could you please correct me if it's wrong and comment on the right one to choose?

METHOD 1: USING CPM (library:edgeR)

data_filt <- cpm(data_filt, log=TRUE)

data_adj1 <- removeBatchEffect(data_filt, Batch = coldata$Batch, covariates=NULL[,-1])

----------------------------------------------------------------------

METHOD 2: USING QUANTILE NORMALIZATION (library:preprocessCore)

QST<- preprocessCore::normalize.quantiles(data_filt, copy = TRUE, keep.names = TRUE)

rownames( QST) <- rownames(data_filt)

colnames( QST) <- colnames(data_filt)

data_adj <- removeBatchEffect(QST, Batch = coldata$Batch, covariates=NULL[,-1])

------------------------------------------------------------------------

METHOD 3: USING DESEQ2 (library:DESeq2)

dds <- DESeqDataSetFromMatrix(countData = data_filt, colData = coldata, design = ~1) #~1 because i have a cohort and not case control

dds <- DESeq(dds)

norm_count <- counts(dds, normalized = TRUE)

data_adj <- removeBatchEffect(norm_count, batch = coldata$Batch, covariates=NULL[,-1])

------------------------------------------------------------------------

METHOD 4: USING VST (library:DESeq2)

dds <- DESeqDataSetFromMatrix(countData = data_filt, colData = coldata, design = ~1)

dds <- DESeq(dds)

vsd <-varianceStabilizingTransformation(dds) #requires matrix or DESeq object

vsd<-getVarianceStabilizedData(vsd)

data_adj1 <- removeBatchEffect(vsd, batch = coldata$Batch, covariates=NULL[,-1])

-------------------------------------------------------------------------

METHOD 5: USING RLOG (library: DESeq2)

rlog <- rlogTransformation(data_filt, blind = FALSE)

data_adj<- removeBatchEffect(rlog, batch = coldata$Batch, covariates = NULL[,-1])

--------------------------------------------------------------------------

Your help will be really appreciated in this matter, Thank you, Sukeshini K

RNA_seqdata Normalization_method WGCNA • 484 views

ADD COMMENT • link 9 weeks ago sukeshinik5 ▴ 20

score 1 · Answer 1 · 2024-10-14

1

Entering edit mode

ATpoint ★ 4.6k

@atpoint-13662

Last seen 4 hours ago

Germany

WGCNA is still not part of Bioconductor (Batch adjustment for cohort based RNA seq data) so please check its documentation what the authors recommend. Generally, note that batch regression should be done on log2-scale data, so method 3 is wrong per se. WGCNA documentation is currently at https://bioinformatics.stackexchange.com/questions/21885/where-to-access-the-wgcna-tutorial-documents-horvath-lab-site-down/21886#21886

ADD COMMENT • link 9 weeks ago ATpoint ★ 4.6k

0

Entering edit mode

Thank you so much for your response. Can you please tell me what type of preprocessing should be expected for any gene correlation analysis in RNA sequencing data?

ADD REPLY • link 9 weeks ago sukeshinik5 ▴ 20

0

Entering edit mode

Read the WGCNA docs, please. Just repeating the question is basically ignoring what I just wrote.

ADD REPLY • link 9 weeks ago ATpoint ★ 4.6k

0

Entering edit mode

I apologize if you felt like I had ignored the response. I am new to the technical terms and analysis. From above, what I understood is that rlog does log2, and vst also mostly performs log2 transformation. So these methods are more appropriate. Regarding the tutorial on WGCNA, it has worked with microarray datasets and has performed rma; if I quantile normalize and perform log transformation and then correct for batch effect, will that be appropriate? (I understand WGCNA is not a part of Bioconductor; only need to know if what I am performing isn't incorrect technically).

https://alexslemonade.github.io/refinebio-examples/04-advanced-topics/network-analysis_rnaseq_01_wgcna.html (suggests vst)

Thank you and apologies for bearing my silly questions, I truly appreciate your support, Sukeshini K

ADD REPLY • link 9 weeks ago sukeshinik5 ▴ 20