I have an RNA-seq dataset and I've performed PCA analysis to check batch effect using package edgeR. and have used raw counts from cuffnorm output.
here is my code
a1<-read.table("raw_counts.txt", sep="\t", header=TRUE)
dge1 <- DGEList(counts=a1[(-1)])
dge1 <- calcNormFactors(dge1)
logCPM1 <- cpm(dge1,log=TRUE,prior.count=5)
logCPM1 <- removeBatchEffect(logCPM1, batch=batch)
B.res = prcomp(logCPM1, scale = TRUE)
pc.s = summary(B.res)$importance[2,1:2]
pc1.var = round(pc.s[["PC1"]],2)
pc2.var = round(pc.s[["PC2"]],2)
pdf(file = "after1_125.pdf")
plot(B.res$rotation[,1:2], main = "After Batch Correction", xlab = paste("PC1 (variance:",pc1.var*100, "%)", sep =""), ylab = paste("PC2 (variance: ",pc2.var*100,"%)", sep =""), col=resp, pch = 16, cex = .9)
From PCA plot I could see there is little batch effect and I have corrected batch effect. Now I want to use batch corrected data for further analysis in cuffdiff. Is this possible? If yes how? or is Tuxedo Suite takes care of batch correction?
Please suggest me tools or workflows which uses batch corrected data for RNA-seq analysis.
Any help would be appreciated.