edgeR paired test with batch effect
2
0
Entering edit mode
Son Pham ▴ 60
@son-pham-6437
Last seen 9.9 years ago
United States

I have a paired test situation, (with and without treatment) on 8 samples. The experiment is (targets.txt):

files Subject Treatment
r1.txt       1       NO
r2.txt       1    TREAT
r3.txt       2       NO
r4.txt       2    TREAT
r5.txt       3       NO
r6.txt       3    TREAT
r7.txt       1       NO
r8.txt       1    TREAT

r1,r2,r3,r4,r5,r6 were sequenced first, and r7,r8 (from subject 1, which are the replicates for r1 an r2) were sequenced in another run (different time).

I wonder how to remove the **batch effect** in this case?

The current code (without moving batch effect) is following:

rm(list=ls(all=TRUE))
library('edgeR')

targets <- readTargets('targets.txt')
y <- readDGE(targets)
keep <- rowSums(cpm(y) >= 1) >= 3
y <- y[keep,]
y$samples$lib.size <- colSums(y$counts)
y <- calcNormFactors(y)
Subject <- factor(targets$Subject)
Treat <- factor(targets$Treatment, levels=c("NO","TREAT"))
design <- model.matrix(~Subject+Treat)
y <- estimateGLMCommonDisp(y,design)
y <- estimateGLMTrendedDisp(y,design)
y <- estimateGLMTagwiseDisp(y,design)
fit <- glmFit(y, design)
lrt <- glmLRT(fit)
topTags(lrt)
summary(de <- decideTestsDGE(lrt))
results <- topTags(lrt,n = length(y$counts[,1]))
write.table(as.matrix(results$table),file="EDGER-TREAT-NO.txt",sep="\t")

Thank you!

edger design and contrast matrix • 2.3k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 17 hours ago
The city by the bay

I assume that your DE contrast of interest is that between the samples with and without the treatment. In such a case, I would redefine:

Subject <- factor(c(1,1,2,2,3,3,4,4))

and use this to define design as you've written above. This distinguishes r1/r2 (i.e., Subject 1 from the first sequencing run) from r7/r8 (i.e., the second sequencing run), effectively treating them as samples from separate subjects. The redefined Subject factor will then account for any differences related to the subject and/or the run-time.

On an unrelated note, you can avoid the assignment to y$samples$lib.size by using:

y <- y[keep,,keep.lib.sizes=FALSE]

This will redefine the library sizes as the column sums of the filtered matrix.

ADD COMMENT
0
Entering edit mode
Son Pham ▴ 60
@son-pham-6437
Last seen 9.9 years ago
United States

Aaron: Thank you for the very skillful way. Probably, I just need to change my targets.txt file to:

r1.txt 1 NO
r2.txt 1 TREAT
r3.txt 2 NO
r4.txt 2 TREAT
r5.txt 3 NO
r6.txt 3 TREAT
r7.txt 4 NO
r8.txt 4 TREAT

and Just for understanding: if I have one more pair r9.txt, r10.txt, which is replicated for r7 and r8 respectively (and also sequenced in the same run as r7 & r8). I would just need to modify my targets.txt to:

r1.txt 1 NO
r2.txt 1 TREAT
r3.txt 2 NO
r4.txt 2 TREAT
r5.txt 3 NO
r6.txt 3 TREAT
r7.txt 4 NO
r8.txt 4 TREAT

r9.txt 4 NO

r10.txt 4 TREAT

Is it correct?

Thank you Aaron!

-Son

 

ADD COMMENT
0
Entering edit mode

That looks right; r9/r10 has the same subject/run-time combination as r7/r8, so it makes sense that the Subject value is the same for these samples.

For future reference, it's better to post your response as a comment to my answer, rather than as a separate answer. This keeps the thread more organized, given that the ordering of answers can change depending on the number of votes.

ADD REPLY

Login before adding your answer.

Traffic: 716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6