Batch effect correction for ANOVA-like DE analysis in edgeR
2
0
Entering edit mode
@manoj-hariharan-5902
Last seen 8.2 years ago

Hello edgeR users,

 

I'm faced with a problem where I'd like to compare expression across "normal" samples. 

 

There are eight samples - first six have two replicates each (RNA-seq done several months ago) and the seventh and eighth samples have three replicates each (RNA-seq done recently). So I'd like to get the DE genes while correcting for the batch effect. 

 

I tried the pipeline as explained in section 4.6 of the manual to "detect genes dierentially expressed in response to hrcC challenge, while correcting for any dierences between the batches."

 

In my case there is no mock vs hrcC, but I guess if I can make the design matrix correctly, I can use the ANOVA-like test. Sadly, I don't think my design matrix isn't right. Could the experts please comment? 

 

Below is my code and error messages. Any help is much appreciated.

 

Thanks,

Manoj

 

 

Sample names:

-------------------

ES07-1, ES07-2, ES08-1, ES08-2, NT01-1, NT01-2, NT02-1, NT02-2, NT03-1, NT03-2, NT04-1, NT04-2, ES14_s1, ES14_s2, ES14_s3, PBT1_s1, PBT1_s2, PBT1_s3

 

 

Code:

--------

targets <- readTargets("CountDataInfo_allRqd8Smpls.txt")

x <- read.delim("AllRqd8Samples_RawReads.txt", row.names=1, stringsAsFactors=FALSE) #no normalization, no batch correction.

y <- DGEList(counts=x[,1:18], group=targets$Treat)

colnames(y) <- targets$Label

dim(y)

keep <- rowSums(cpm(y)>1) >= 2;

y <- y[keep,]

dim(y)

y$samples$lib.size <- colSums(y$counts)

y$samples

y <- calcNormFactors(y)

y$samples

Treat <- factor(substring(colnames(x),1,5))

Batch <- factor(c("1","1","1","1","1","1","1","1","1","1","1","1","2","2","2","2","2","2"))

design <- model.matrix(~Batch+Treat)

rownames(design) <- colnames(y)

design

y <- estimateGLMCommonDisp(y, design, verbose=TRUE)

 

 

 

Output messages:

------------------------

> design
        (Intercept) Batch2 TreathESO7 TreathESO8 TreatPBNT1 TreatSCNT1 TreatSCNT2 TreatSCNT3 TreatSCNT4
ES07-1            1      0          1          0          0          0          0          0          0
ES07-2            1      0          1          0          0          0          0          0          0
ES08-1            1      0          0          1          0          0          0          0          0
ES08-2            1      0          0          1          0          0          0          0          0
NT01-1            1      0          0          0          0          1          0          0          0
NT01-2            1      0          0          0          0          1          0          0          0
NT02-1            1      0          0          0          0          0          1          0          0
NT02-2            1      0          0          0          0          0          1          0          0
NT03-1            1      0          0          0          0          0          0          1          0
NT03-2            1      0          0          0          0          0          0          1          0
NT04-1            1      0          0          0          0          0          0          0          1
NT04-2            1      0          0          0          0          0          0          0          1
ES14_s1           1      1          0          0          0          0          0          0          0
ES14_s2           1      1          0          0          0          0          0          0          0
ES14_s3           1      1          0          0          0          0          0          0          0
PBT1_s1           1      1          0          0          1          0          0          0          0
PBT1_s2           1      1          0          0          1          0          0          0          0
PBT1_s3           1      1          0          0          1          0          0          0          0
attr(,"assign")
[1] 0 1 2 2 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$Batch
[1] "contr.treatment"

attr(,"contrasts")$Treat
[1] "contr.treatment"

> y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 TreatSCNT4

 

> sessionInfo()

R version 3.1.1 (2014-07-10)

Platform: x86_64-apple-darwin13.1.0 (64-bit)

 

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 

attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base     

 

other attached packages:

[1] BiocInstaller_1.16.5 edgeR_3.8.6          limma_3.22.7        

 

loaded via a namespace (and not attached):

[1] tools_3.1.1

edgeR batch effect • 1.5k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 9 minutes ago
WEHI, Melbourne, Australia

Dear Manoj,

Statistics is just a reflection of scientific reality. In your case, treatments ES14 and PBT1 were only applied in the second batch whereas the other treatments were only applied in the first batch. So, if there really is a batch effect, it should be scientifically clear that you can't reliably compare ES14 or PBT14 with the other treatments. Nor is there any way for edgeR to estimate the batch effect because it is completely confounded with differences between the treatments. In general, it is only possible to compare treatments that occur together in at least one batch.

So you need to decide whether you really do need to adjust for batches. If you do, then you can only make comparisons between treatments within the same batch.

The usual way to handle batch effects is to repeat all the treatments at each time, rather than to do some treatments at one time and different treatments at another. In the case study in the edgeR User's Guide, both real and mock treatments were conducted at each time, and it was that which permitted a reliable batch effect to be estimated and removed.

Gordon

ADD COMMENT
0
Entering edit mode
@manoj-hariharan-5902
Last seen 8.2 years ago

Dear Gordon,

Thanks for your reply. 

I understand the problem... too bad it wouldn't be possible to compare these samples with each other.

I had re-processed the raw data using the same tools (mapping / quantification). But I guess that wouldn't be good enough.

Thanks,

Manoj.

 

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6