Question

Batch effect correction for ANOVA-like DE analysis in edgeR

0

Entering edit mode

Manoj Hariharan ▴ 110

@manoj-hariharan-5902

Last seen 8.2 years ago

Hello edgeR users,

I'm faced with a problem where I'd like to compare expression across "normal" samples.

There are eight samples - first six have two replicates each (RNA-seq done several months ago) and the seventh and eighth samples have three replicates each (RNA-seq done recently). So I'd like to get the DE genes while correcting for the batch effect.

I tried the pipeline as explained in section 4.6 of the manual to "detect genes dierentially expressed in response to hrcC challenge, while correcting for any dierences between the batches."

In my case there is no mock vs hrcC, but I guess if I can make the design matrix correctly, I can use the ANOVA-like test. Sadly, I don't think my design matrix isn't right. Could the experts please comment?

Below is my code and error messages. Any help is much appreciated.

Thanks,

Manoj

Sample names:

-------------------

ES07-1, ES07-2, ES08-1, ES08-2, NT01-1, NT01-2, NT02-1, NT02-2, NT03-1, NT03-2, NT04-1, NT04-2, ES14_s1, ES14_s2, ES14_s3, PBT1_s1, PBT1_s2, PBT1_s3

Code:

--------

targets <- readTargets("CountDataInfo_allRqd8Smpls.txt")

x <- read.delim("AllRqd8Samples_RawReads.txt", row.names=1, stringsAsFactors=FALSE) #no normalization, no batch correction.

y <- DGEList(counts=x[,1:18], group=targets$Treat)

colnames(y) <- targets$Label

dim(y)

keep <- rowSums(cpm(y)>1) >= 2;

y <- y[keep,]

dim(y)

y$samples$lib.size <- colSums(y$counts)

y$samples

y <- calcNormFactors(y)

y$samples

Treat <- factor(substring(colnames(x),1,5))

Batch <- factor(c("1","1","1","1","1","1","1","1","1","1","1","1","2","2","2","2","2","2"))

design <- model.matrix(~Batch+Treat)

rownames(design) <- colnames(y)

design

y <- estimateGLMCommonDisp(y, design, verbose=TRUE)

Output messages:

------------------------

> design
(Intercept) Batch2 TreathESO7 TreathESO8 TreatPBNT1 TreatSCNT1 TreatSCNT2 TreatSCNT3 TreatSCNT4
ES07-1 1 0 1 0 0 0 0 0 0
ES07-2 1 0 1 0 0 0 0 0 0
ES08-1 1 0 0 1 0 0 0 0 0
ES08-2 1 0 0 1 0 0 0 0 0
NT01-1 1 0 0 0 0 1 0 0 0
NT01-2 1 0 0 0 0 1 0 0 0
NT02-1 1 0 0 0 0 0 1 0 0
NT02-2 1 0 0 0 0 0 1 0 0
NT03-1 1 0 0 0 0 0 0 1 0
NT03-2 1 0 0 0 0 0 0 1 0
NT04-1 1 0 0 0 0 0 0 0 1
NT04-2 1 0 0 0 0 0 0 0 1
ES14_s1 1 1 0 0 0 0 0 0 0
ES14_s2 1 1 0 0 0 0 0 0 0
ES14_s3 1 1 0 0 0 0 0 0 0
PBT1_s1 1 1 0 0 1 0 0 0 0
PBT1_s2 1 1 0 0 1 0 0 0 0
PBT1_s3 1 1 0 0 1 0 0 0 0
attr(,"assign")
[1] 0 1 2 2 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$Batch
[1] "contr.treatment"

attr(,"contrasts")$Treat
[1] "contr.treatment"

> y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, :
Design matrix not of full rank. The following coefficients not estimable:
TreatSCNT4
>
>

> sessionInfo()

R version 3.1.1 (2014-07-10)

Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] BiocInstaller_1.16.5 edgeR_3.8.6 limma_3.22.7

loaded via a namespace (and not attached):

[1] tools_3.1.1

>

edgeR batch effect • 1.5k views

ADD COMMENT • link 8.2 years ago Manoj Hariharan ▴ 110

score 1 · Answer 1 · 2016-01-27

Dear Manoj,

Statistics is just a reflection of scientific reality. In your case, treatments ES14 and PBT1 were only applied in the second batch whereas the other treatments were only applied in the first batch. So, if there really is a batch effect, it should be scientifically clear that you can't reliably compare ES14 or PBT14 with the other treatments. Nor is there any way for edgeR to estimate the batch effect because it is completely confounded with differences between the treatments. In general, it is only possible to compare treatments that occur together in at least one batch.

So you need to decide whether you really do need to adjust for batches. If you do, then you can only make comparisons between treatments within the same batch.

The usual way to handle batch effects is to repeat all the treatments at each time, rather than to do some treatments at one time and different treatments at another. In the case study in the edgeR User's Guide, both real and mock treatments were conducted at each time, and it was that which permitted a reliable batch effect to be estimated and removed.

Gordon

score 0 · Answer 2 · 2016-01-28

0

Entering edit mode

Manoj Hariharan ▴ 110

@manoj-hariharan-5902

Last seen 8.2 years ago

Dear Gordon,

Thanks for your reply.

I understand the problem... too bad it wouldn't be possible to compare these samples with each other.

I had re-processed the raw data using the same tools (mapping / quantification). But I guess that wouldn't be good enough.

Thanks,

Manoj.

ADD COMMENT • link 8.2 years ago Manoj Hariharan ▴ 110