Validity of duplicateCorrelation after ComBat in this scenario
Entering edit mode
Last seen 2.3 years ago


I did an analysis a few years ago using ComBat adjusted microarray values fed into limma and was wondering if it's acceptable (Maybe not ideal, but informative). We've been able to validate many of these genes since. I've since read that when using limma with a ComBat modified matrix, the batch covariate should still be used in the subsequent limma design. I did not do that here however, duplicateCorrelation was used on SampleID which is nested within Batch. Does this possibly serve a similar purpose without also putting Batch in design on top of duplicateCorrelation of SampleID? My code follows:

> SampleTable
   Sample SampleID  Batch Treatment
1   A_Tx1        A Batch1       Tx1
5   B_Tx1        B Batch1       Tx1
9   C_Tx1        C Batch2       Tx1
13  D_Tx1        D Batch2       Tx1
2   A_Tx2        A Batch1       Tx2
6   B_Tx2        B Batch1       Tx2
10  C_Tx2        C Batch2       Tx2
14  D_Tx2        D Batch2       Tx2
3   A_Tx3        A Batch1       Tx3
7   B_Tx3        B Batch1       Tx3
11  C_Tx3        C Batch2       Tx3
15  D_Tx3        D Batch2       Tx3
4   A_Tx4        A Batch1       Tx4
8   B_Tx4        B Batch1       Tx4
12  C_Tx4        C Batch2       Tx4
16  D_Tx4        D Batch2       Tx4

modcombat = model.matrix(~1, data=SampleTable) ### Treatment could be included here alternatively
combat_edata = ComBat(dat=edata, batch=SampleTable$Batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)

design <- model.matrix(~0+SampleTable$Treatment)
corfit <- duplicateCorrelation(combat_edata,design,block=SampleTable$SampleID)
fitV <- lmFit(combat_edata, design, block=SampleTable$SampleID, cor=corfit$consensus.correlation)
fitV <- eBayes(fitV)
fitV2 <- eBayes(, contrasts = c(1,0,-1,0)))
sig <- topTable(fitV2, coef = 1 , number = 10000000, p.value = 0.05)

Running it with Batch included in the design in addition to duplicateCorrelation on SampleID produces similar gene lists with the provided code.

Venn Diagram:

Is this a non-incorrect approach? Thanks!

limma ComBat SVA • 336 views
Entering edit mode
Last seen 15 minutes ago
WEHI, Melbourne, Australia

You already know that inputing ComBat adjusted values into limma as if they were unadjusted is not correct. For your experiment, the effect would be relatively small. Given your code, limma will think that the residual df is 12 when it really should be closer to 11. That will tend to make all the estimated standard errors about 4% smaller than they should be and the t-statistics about 4% larger than they should be. You will therefore get a bit too much DE.

The use of duplicateCorrelation ameliorates the problem somewhat but, no, it does not make your model equivalent to including Batch in the design matrix.

I don't quite understand what the ultimate problem is though. You say that get a similar gene list even if you do include Batch in the design, so why not do that?

As another alternative, I would consider defining:

design <- mode.matrix(~SampleID + Treatment)

and doing away with both ComBat and duplicateCorrelation.

Entering edit mode

Hi Dr. Smyth,

Thanks so much for the response. This is what I expected, but wanted to double check. Luckily, the genes that are sensitive to adding Batch to the design are not ones we're particularly interested in for this study. Thanks again!


Login before adding your answer.

Traffic: 376 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6