Dear all, I would appreciate any comments on the R code below that :

-- uses edgeR to measure the differential expression between siCTL and siPROTEIN

-- uses removeBatchEffect() function to correct for batch effects.

The question would be : is the correct R code that corrects for batch effect first, and subsequently calls for differential expression ? Thank you ! I am also using a paired-design for our samples, as the subjects have been sequenced at different times.

eset <- read.delim("mm8.genes.no.adj", row.names="Symbol")

group <- factor(c("siCTL","siCTL","siPROTEIN","siPROTEIN"))

group <- relevel(group,ref="siCTL")

subject <- factor(c(1,2,1,2))

design <- model.matrix(~group+subject)

y <- DGEList(counts=eset,group=group)

keep <- rowSums(cpm(y) > 0.5) >= 4

y <- y[keep,,keep.lib.sizes=FALSE]

y <- calcNormFactors(y)

logCPM <- cpm(y,log=TRUE,prior.count=3)

plotMDS(logCPM, col=as.numeric(subject))

logCPM <- removeBatchEffect(logCPM, batch=subject)

plotMDS(logCPM, col=as.numeric(subject))

fit <- lmFit(logCPM, design)

fit <- eBayes(fit,trend=TRUE, robust=TRUE)

results <- topTable(fit, coef=2, adjust="fdr", number=Inf)

write.table(results, file="")

Dear Aaron, and Gordon, thank you for your comments and help. I will get back to you with a more general piece of code for removing of the batch effects, and will ask for your comments, if you do not mind. many thanks !