I have seen many posts about batch effect correction on RNA-seq data. But I'm little confused in using the code for differential analysis. So, asking it here.

I have been using 8 rna-seq samples, 4 NOTCH silenced samples and 4 Controls. The setup is like this

```
Samples type days
1 Control day1
2 Control day1
3 Control day2
4 Control day2
5 shNotch day2
6 shNotch day2
7 shNotch day1
8 shNotch day1
```

For my data I did the batch correction and then used the data for an MDS plot to look how they clustered.

```
library(edgeR)
group <- factor(paste0(columndata$type))
dge <- DGEList(counts=countdata, group=group)
keep <- rowSums(cpm(dge) > 1) >= 3
dge <- dge[keep, , keep.lib.sizes=FALSE]
dge <- calcNormFactors(dge, method = "TMM")
logCPM <- cpm(dge, log=TRUE, prior.count=2)
logCPM_bc <- removeBatchEffect(logCPM, batch=columndata$days)
plotMDS(logCPM_bc)
```

For differential analysis first I created design matrix like following:

```
design <- model.matrix(~ 0 + group + columndata$days)
colnames(design) <- c("Control","shNotch", "Days")
contrast.matrix <- makeContrasts(shNotch-Control, levels=design)
```

And now I was wondering whether to use DGElist object `dge`

or `logCPM_bc`

(Batch corrected logCPM) for estimating dispersion `estimateDisp`

, for `glmQLFit`

.

Any help? thanks a lot

`removeBatchEffect`

should not be used for DE analysis but for other downstream analysis, such as PCA. For adjustment of DE analysis, indeed you should include`columndata$days`

in design matrix. Therefore, your`dge`

object should be used. This is important because the linear modeling steps should know that it uses one degree of freedom for the adjustment.Yes I understand now. Thanks a lot Mikhael.