Dear all, I would appreciate any comments on the R code below that :
-- uses edgeR to measure the differential expression between siCTL and siPROTEIN
-- uses removeBatchEffect() function to correct for batch effects.
The question would be : is the correct R code that corrects for batch effect first, and subsequently calls for differential expression ? Thank you ! I am also using a paired-design for our samples, as the subjects have been sequenced at different times.
eset <- read.delim("mm8.genes.no.adj", row.names="Symbol")
group <- factor(c("siCTL","siCTL","siPROTEIN","siPROTEIN"))
group <- relevel(group,ref="siCTL")
subject <- factor(c(1,2,1,2))
design <- model.matrix(~group+subject)
y <- DGEList(counts=eset,group=group)
keep <- rowSums(cpm(y) > 0.5) >= 4
y <- y[keep,,keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
logCPM <- cpm(y,log=TRUE,prior.count=3)
plotMDS(logCPM, col=as.numeric(subject))
logCPM <- removeBatchEffect(logCPM, batch=subject)
plotMDS(logCPM, col=as.numeric(subject))
fit <- lmFit(logCPM, design)
fit <- eBayes(fit,trend=TRUE, robust=TRUE)
results <- topTable(fit, coef=2, adjust="fdr", number=Inf)
write.table(results, file="")
Dear Aaron, and Gordon, thank you for your comments and help. I will get back to you with a more general piece of code for removing of the batch effects, and will ask for your comments, if you do not mind. many thanks !