Dear All, I have read (EdgeR: Accounting for batch effects in a pairwise analysis) that there is no need to remove the batch effects for a paired design as they are auto corrected.
I am dealing with data from primary cells. There is lot of heterogeneity in the cells. So I initially thought to use RUV-Seq which corrects for unwanted variation and it estimates factors for unwanted variation in the data set and returns it as W_1.
library(RUVSeq) filtered <- read.delim("filt_counts.txt",header=T,row.names=1) treat <- as.factor(rep(c("treated","Untreated"),8)) subjects=factor(c(rep(1:8, each=2))) design <- model.matrix(~subjects+treat) set <- newSeqExpressionSet(as.matrix(filtered), phenoData = data.frame(treat, row.names=colnames(filtered))) set <- betweenLaneNormalization(set, which="upper") #create empirical data set y <- DGEList(counts=filtered, group=treat) y <- calcNormFactors(y, method="upperquartile") y <- estimateGLMCommonDisp(y, design, verbose=TRUE) y <- estimateGLMTrendedDisp(y, design) y <- estimateGLMTagwiseDisp(y, design) fit <- glmFit(y, design) lrt <- glmLRT(fit) top <- topTags(lrt, n=nrow(y))$table empirical <- rownames(set)[which(!(rownames(set) %in% rownames(top)[1:5000]))] #normalise using empirical data set and estimate W_1 set2 <- RUVg(set, empirical, k=1) #DE analysis using the estimated W_1, final result design <- model.matrix(~subjects+W_1+treat, data=pData(set2)) y <- DGEList(counts=counts(set2), group=treat) y <- calcNormFactors(y) y <- estimateGLMCommonDisp(y, design,verbose=TRUE) y <- estimateGLMTagwiseDisp(y, design) fit <- glmFit(y, design) lrt <- glmLRT(fit)
But now I learned that the paired analysis need not to be batch corrected and my design would introduce biases in the analysis as I might be doing it wrong.
I would like to know if its wrong to try to batch correct paired-analysis or if there is any way to remove hidden, unwanted variation to see true signal in data with heterogeneity.