I have a paired test situation, (with and without treatment) on 8 samples. The experiment is (targets.txt):
files Subject Treatment r1.txt 1 NO r2.txt 1 TREAT r3.txt 2 NO r4.txt 2 TREAT r5.txt 3 NO r6.txt 3 TREAT r7.txt 1 NO r8.txt 1 TREAT
r1,r2,r3,r4,r5,r6 were sequenced first, and r7,r8 (from subject 1, which are the replicates for r1 an r2) were sequenced in another run (different time).
I wonder how to remove the **batch effect** in this case?
The current code (without moving batch effect) is following:
rm(list=ls(all=TRUE)) library('edgeR') targets <- readTargets('targets.txt') y <- readDGE(targets) keep <- rowSums(cpm(y) >= 1) >= 3 y <- y[keep,] y$samples$lib.size <- colSums(y$counts) y <- calcNormFactors(y) Subject <- factor(targets$Subject) Treat <- factor(targets$Treatment, levels=c("NO","TREAT")) design <- model.matrix(~Subject+Treat) y <- estimateGLMCommonDisp(y,design) y <- estimateGLMTrendedDisp(y,design) y <- estimateGLMTagwiseDisp(y,design) fit <- glmFit(y, design) lrt <- glmLRT(fit) topTags(lrt) summary(de <- decideTestsDGE(lrt)) results <- topTags(lrt,n = length(y$counts[,1])) write.table(as.matrix(results$table),file="EDGER-TREAT-NO.txt",sep="\t")
Thank you!
That looks right; r9/r10 has the same subject/run-time combination as r7/r8, so it makes sense that the
Subject
value is the same for these samples.For future reference, it's better to post your response as a comment to my answer, rather than as a separate answer. This keeps the thread more organized, given that the ordering of answers can change depending on the number of votes.