Question: Use of batch corrected (removeBatchEffect) read counts for further downstream analysis.
1
gravatar for ashwini.kumar
23 months ago by
ashwini.kumar10 wrote:

Hi,
I have RNA-seq data from 100 samples, 40 samples were sequenced using one library (Nextera) and 60 samples were sequenced using another library (Scriptseq), this resulted in known batch effect, now we want to analyze these samples together. To correct this known batch effect, I am using following two approaches, please suggest which one is correct and can we use these batch correct expression values for further analysis, where we need already batch corrected expression values e.g. network analysis?
1. CPM
library(edgeR)

  dge <- DGEList(counts=count)

  dge <- calcNormFactors(dge, method = “TMM”)

  logCPM <- cpm(dge,log=TRUE,prior.count=5)

  logCPM <- removeBatchEffect(logCPM,batch=batch, batch2 = batch2)

2. VOOM

y <- DGEList(counts=count)

voom.data <- voom(y,  plot=F, design = design.voom)

logCPM_voom <- removeBatchEffect(voom.data,batch=batch, batch2 = batch2)

We also have drug response data from these 100 samples and I would like to correlate gene expression with drug response.  Moreover, I want to use WGCNA and other network analysis approaches where I need already batch corrected expression values. I would like to know that can I use these batch corrected values in downstream analyses e.g. correalation (drug response and gene expresion) or these values are only useful for the visualization purposes e.g. heatmaps, PCA clustering.

Thank you!

Best regards,

Ashwini 

limma removebatcheffect() • 843 views
ADD COMMENTlink modified 23 months ago by Aaron Lun24k • written 23 months ago by ashwini.kumar10
Answer: Use of batch corrected (removeBatchEffect) read counts for further downstream an
0
gravatar for Aaron Lun
23 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

To answer your immediate question: we generally recommend using the first approach. A large prior.count avoids getting very large negative values from zero counts, which could be misleading during visualization.

However, batch-corrected expression values should only be used when you have no other choice, i.e., in procedures that do not accept design matrices. This includes visualization with heatmaps, dimensionality reduction and various forms of clustering; it may include network analyses, though you'll have to look at those methods specifically. If you are looking for changes in expression with respect to drug response, you can do that as part of a standard DE analysis with blocking on batch - there is no need to use batch-corrected values.

More generally, though, I would question the wisdom of forcing two data sets into a single analysis. Differences in the technology will likely result in differences in the mean-variance relationship, normalization, etc., which will not be properly handled if all samples are lumped together. I would suggest doing some type of meta-analysis instead, e.g., A: Merge different datasets regarding a common number of DE genes in R.

ADD COMMENTlink modified 23 months ago • written 23 months ago by Aaron Lun24k

Thank you, Aaron 

ADD REPLYlink written 23 months ago by ashwini.kumar10

Hi Aaron,

Do you think this approach is right for making a clustering heatmap?

library(edgeR)
dge <- DGEList(counts=count)
dge <- calcNormFactors(dge, method = “TMM”)
logCPM <- cpm(dge,log=TRUE,prior.count=5)
logCPM_bc <- removeBatchEffect(logCPM,batch=batch)

# Scale data to Z-scores
logCPM_z <- t(scale(t(logCPM_bc)))

And then using this logCPM_z for clustering heatmap. Is this right?

ADD REPLYlink written 9 weeks ago by Biologist70

Don't resurrect old threads, the original question is 2 years old.

And besides, you already have a perfectly good answer here.

ADD REPLYlink written 9 weeks ago by Aaron Lun24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 120 users visited in the last hour