I have RNA-seq data from 100 samples, 40 samples were sequenced using one library (Nextera) and 60 samples were sequenced using another library (Scriptseq), this resulted in known batch effect, now we want to analyze these samples together. To correct this known batch effect, I am using following two approaches, please suggest which one is correct and can we use these batch correct expression values for further analysis, where we need already batch corrected expression values e.g. network analysis?
dge <- DGEList(counts=count)
dge <- calcNormFactors(dge, method = “TMM”)
logCPM <- cpm(dge,log=TRUE,prior.count=5)
logCPM <- removeBatchEffect(logCPM,batch=batch, batch2 = batch2)
y <- DGEList(counts=count)
voom.data <- voom(y, plot=F, design = design.voom)
logCPM_voom <- removeBatchEffect(voom.data,batch=batch, batch2 = batch2)
We also have drug response data from these 100 samples and I would like to correlate gene expression with drug response. Moreover, I want to use WGCNA and other network analysis approaches where I need already batch corrected expression values. I would like to know that can I use these batch corrected values in downstream analyses e.g. correalation (drug response and gene expresion) or these values are only useful for the visualization purposes e.g. heatmaps, PCA clustering.