Hi,
I have RNA-seq data from 100 samples, 40 samples were sequenced using one library (Nextera) and 60 samples were sequenced using another library (Scriptseq), this resulted in known batch effect, now we want to analyze these samples together. To correct this known batch effect, I am using following two approaches, please suggest which one is correct and can we use these batch correct expression values for further analysis, where we need already batch corrected expression values e.g. network analysis?
1. CPM
library(edgeR)
dge <- DGEList(counts=count)
dge <- calcNormFactors(dge, method = “TMM”)
logCPM <- cpm(dge,log=TRUE,prior.count=5)
logCPM <- removeBatchEffect(logCPM,batch=batch, batch2 = batch2)
2. VOOM
y <- DGEList(counts=count)
voom.data <- voom(y, plot=F, design = design.voom)
logCPM_voom <- removeBatchEffect(voom.data,batch=batch, batch2 = batch2)
We also have drug response data from these 100 samples and I would like to correlate gene expression with drug response. Moreover, I want to use WGCNA and other network analysis approaches where I need already batch corrected expression values. I would like to know that can I use these batch corrected values in downstream analyses e.g. correalation (drug response and gene expresion) or these values are only useful for the visualization purposes e.g. heatmaps, PCA clustering.
Thank you!
Best regards,
Ashwini
Thank you, Aaron
Hi Aaron,
Do you think this approach is right for making a clustering heatmap?
And then using this
logCPM_z
for clustering heatmap. Is this right?Don't resurrect old threads, the original question is 2 years old.
And besides, you already have a perfectly good answer here.