Question

edgeR paired test with batch effect

0

Entering edit mode

Son Pham ▴ 60

@son-pham-6437

Last seen 10.3 years ago

United States

I have a paired test situation, (with and without treatment) on 8 samples. The experiment is (targets.txt):

files Subject Treatment
r1.txt       1       NO
r2.txt       1    TREAT
r3.txt       2       NO
r4.txt       2    TREAT
r5.txt       3       NO
r6.txt       3    TREAT
r7.txt       1       NO
r8.txt       1    TREAT

r1,r2,r3,r4,r5,r6 were sequenced first, and r7,r8 (from subject 1, which are the replicates for r1 an r2) were sequenced in another run (different time).

I wonder how to remove the **batch effect** in this case?

The current code (without moving batch effect) is following:

rm(list=ls(all=TRUE))
library('edgeR')

targets <- readTargets('targets.txt')
y <- readDGE(targets)
keep <- rowSums(cpm(y) >= 1) >= 3
y <- y[keep,]
y$samples$lib.size <- colSums(y$counts)
y <- calcNormFactors(y)
Subject <- factor(targets$Subject)
Treat <- factor(targets$Treatment, levels=c("NO","TREAT"))
design <- model.matrix(~Subject+Treat)
y <- estimateGLMCommonDisp(y,design)
y <- estimateGLMTrendedDisp(y,design)
y <- estimateGLMTagwiseDisp(y,design)
fit <- glmFit(y, design)
lrt <- glmLRT(fit)
topTags(lrt)
summary(de <- decideTestsDGE(lrt))
results <- topTags(lrt,n = length(y$counts[,1]))
write.table(as.matrix(results$table),file="EDGER-TREAT-NO.txt",sep="\t")

Thank you!

edger design and contrast matrix • 2.3k views

ADD COMMENT • link updated 10.3 years ago by Gordon Smyth 52k • written 10.3 years ago by Son Pham ▴ 60

score 2 · Answer 1 · 2014-12-29

I assume that your DE contrast of interest is that between the samples with and without the treatment. In such a case, I would redefine:

Subject <- factor(c(1,1,2,2,3,3,4,4))

and use this to define design as you've written above. This distinguishes r1/r2 (i.e., Subject 1 from the first sequencing run) from r7/r8 (i.e., the second sequencing run), effectively treating them as samples from separate subjects. The redefined Subject factor will then account for any differences related to the subject and/or the run-time.

On an unrelated note, you can avoid the assignment to y$samples$lib.size by using:

y <- y[keep,,keep.lib.sizes=FALSE]

This will redefine the library sizes as the column sums of the filtered matrix.

score 0 · Answer 2 · 2014-12-29

0

Entering edit mode

Son Pham ▴ 60

@son-pham-6437

Last seen 10.3 years ago

United States

Aaron: Thank you for the very skillful way. Probably, I just need to change my targets.txt file to:

r1.txt 1 NO
r2.txt 1 TREAT
r3.txt 2 NO
r4.txt 2 TREAT
r5.txt 3 NO
r6.txt 3 TREAT
r7.txt 4 NO
r8.txt 4 TREAT

and Just for understanding: if I have one more pair r9.txt, r10.txt, which is replicated for r7 and r8 respectively (and also sequenced in the same run as r7 & r8). I would just need to modify my targets.txt to:

r1.txt 1 NO
r2.txt 1 TREAT
r3.txt 2 NO
r4.txt 2 TREAT
r5.txt 3 NO
r6.txt 3 TREAT
r7.txt 4 NO
r8.txt 4 TREAT

r9.txt 4 NO

r10.txt 4 TREAT

Is it correct?

Thank you Aaron!

-Son

ADD COMMENT • link 10.3 years ago Son Pham ▴ 60

0

Entering edit mode

That looks right; r9/r10 has the same subject/run-time combination as r7/r8, so it makes sense that the Subject value is the same for these samples.

For future reference, it's better to post your response as a comment to my answer, rather than as a separate answer. This keeps the thread more organized, given that the ordering of answers can change depending on the number of votes.

ADD REPLY • link 10.3 years ago Aaron Lun ★ 28k