Question

edgeR and batch effects

0

Entering edit mode

chatterjee.arumoy ▴ 10

@chatterjeearumoy-23195

Last seen 4.8 years ago

Hi,

I am trying to perform differential gene expression analysis with edgeR. I have four conditions T1, T2, T3 and T4 and two batches. I am running the following codes:

Count1=read.table("GENESFORANALYSIS.txt", sep="\t", header=TRUE)

 colnames(Count1) = paste(c("Genename", "T1R1","T1R2","T1R3",  
                                                                      "T2R1","T2R2","T2R3", 
                                                                      "T3R1","T3R2","T3R3", 
                                                                      "T4R1","T4R2","T4R3"))

 conditions = c(rep("T1", 3), rep("T2", 3), rep("T3", 3), rep("T4",3))

 batch=c("A", "B", "A", 
    "A", "A", "B",
    "A", "A", "B",
    "A", "A", "B")

   y=DGEList(counts=Count1[,-1], genes=Count1[,1],group=conditions)

   cpm=10/(min(y$samples$lib.size)/10^6)
   keep = rowSums(cpm(y) >= 1.105529) >= 3
   y = y[keep,,keep.lib.sizes=FALSE]

  design=model.matrix(~ batch + conditions, data=y$samples)
  rownames(design) = colnames(y)

    z = calcNormFactors(y)
    z = estimateGLMCommonDisp(z, design, verbose=TRUE)
    z = estimateGLMTrendedDisp(z, design)
    z = estimateGLMTagwiseDisp(z, design)

   fit1 = glmFit(z, design)
    lrt1 = glmLRT(fit1, coef=3)
     tt=topTags(lrt1, n=nrow(z))
       head(tt$table)
         summary(decideTests(lrt1))

My model matrix is: (1)

 design=model.matrix(~ batch + conditions, data=y$samples)
           (Intercept) batchB conditionsT2 conditionsT3 conditionsT4
 T1R1           1      0            0            0            0
 T1R2           1      1            0            0            0
 T1R3           1      0            0            0            0
 T2R1           1      0            1            0            0
 T2R2           1      0            1            0            0
 T2R3           1      1            1            0            0
 T3R1           1      0            0            1            0
 T3R2           1      0            0            1            0
 T3R3           1      1            0            1            0
 T4R1           1      0            0            0            1
 T4R2           1      0            0            0            1
 T4R3           1      1            0            0            1
attr(,"assign")
[1] 0 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$batch
[1] "contr.treatment"

attr(,"contrasts")$conditions
 [1] "contr.treatment"

(2)

design=model.matrix(~ 0+ batch + conditions, data=y$samples)
     batchA batchB conditionsT2 conditionsT3 conditionsT4
T1R1      1      0            0            0            0
T1R2      0      1            0            0            0
T1R3      1      0            0            0            0
T2R1      1      0            1            0            0
T2R2      1      0            1            0            0
T2R3      0      1            1            0            0
T3R1      1      0            0            1            0
T3R2      1      0            0            1            0
T3R3      0      1            0            1            0
T4R1      1      0            0            0            1
T4R2      1      0            0            0            1
T4R3      0      1            0            0            1
attr(,"assign")
[1] 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$batch
[1] "contr.treatment"

attr(,"contrasts")$conditions
[1] "contr.treatment"

Now, what I need to know. 1) I need to know how to compare T1 vs T2, T1vs T3, T1vs T4, T2vsT3, T2vs T4, T3vs T4 using the commands (a) contrast= c(0, 0, 0, 0, 0) as well as coef=3/4/5/3:4/etc. 2) I need to know if my models (with or without intercepts) will include the batch effects while giving me the names of the differentially expressed genes?

I would like someone to kindly reply to me. I have checked the edgeR manual and not able to understand it. Thanks a lot. Arumoy.

edger • 1.3k views

ADD COMMENT • link updated 4.8 years ago by Steve Lianoglou ★ 13k • written 4.8 years ago by chatterjee.arumoy ▴ 10

1

Entering edit mode

It would be easiest to write the design as 0 + conditions + batch so switching position of batch and condition. With this design you can make all your desired contrasts while properly accounting for the batch. The edgeR maintainers will probably add their expert opinions on that, but isn't the QLF framework the preferred pipeline these days rather than the LRT test?

ADD REPLY • link 4.8 years ago ATpoint ★ 4.9k

score 5 · Answer 1 · 2020-09-16

Both models (with and without the intercept) account for batch, but setting up the proper contrasts will be more straightforward using the no-intercept model.

To make our lives easier, put your group of interest up front and the batch last ie.

design <- model.matrix(~ 0 + conditions + batch, data = y$samples)

Let's model our analysis on the edgeR quasi-likelihood workflow, and it will look something like this:

library(edgeR)
# ... data munging ...
y.all <- calcNormFactors(DGEList(...))
design <- model.matrix(~ 0+ conditions + batch, data=y.all$samples)

# remove lowly expressed genes
y <- calcNormFactors(y.all[filterByExpr(y.all, design),,keep.lib.sizes = FALSE])
y <- estimateDisp(y, design, robust=TRUE)

contrasts <- makeContrasts(
  T1vsT2 = conditionsT1 - conditionsT2,
  T1vsT3 = conditionsT1 - conditionsT3,
  # ...
  T3vsT4 = conditionsT3 - conditionsT4,
  levels = design)

fit <- glmQLFit(y, design, robust=TRUE)

# Here is **one way** to get list of dge statistics for each contrast 
# we have defined.
results <- sapply(colnames(contrasts), function(cname) {
  res <- glmQLFTest(fit, contrast = contrasts[, cname])
  topTags(res, n = Inf, sort.by = "none")
}, simplify = FALSE)