Question

voom with combat

1

Entering edit mode

eldronzhou ▴ 10

@eldronzhou-10851

Last seen 4.4 years ago

Hi everyone,

Sorry if the following post is too long.

I have below RNA-seq samples with very severe donor batch effects, but the design is balanced. I am running limma+voom pipeline.

  sample                         sample_details
1617  A16_1_1    IFNAR_Mcl1_DKO_MEFs_Treated_ABT 737
161Q  A16_1_1 IFNAR_Mcl1_DKO_MEFs_Treated_ABT737_QVD
161U  A16_1_1          IFNAR_Mcl1_DKO_MEFs_Untreated
1627 A16_2_10    IFNAR_Mcl1_DKO_MEFs_Treated_ABT 737
162Q A16_2_10 IFNAR_Mcl1_DKO_MEFs_Treated_ABT737_QVD
162U A16_2_10          IFNAR_Mcl1_DKO_MEFs_Untreated
2517  A25_1_9    IFNAR_Mcl1_DKO_MEFs_Treated_ABT 737
251Q  A25_1_9 IFNAR_Mcl1_DKO_MEFs_Treated_ABT737_QVD
251U  A25_1_9          IFNAR_Mcl1_DKO_MEFs_Untreated
2527 A25_2_3     IFNAR_Mcl1_DKO_MEFs_Treated_ABT 737
252Q A25_2_3  IFNAR_Mcl1_DKO_MEFs_Treated_ABT737_QVD
252U A25_2_3           IFNAR_Mcl1_DKO_MEFs_Untreated

> dmat = model.matrix(~ 0 + sample_details + sample, data=IFNAR$samples)
> v <- voom(IFNAR, design=dmat, plot=T)
> corr <- duplicateCorrelation(v, dmat, block=IFNAR$samples$sample)
> corr$consensus
[1] 0.9919988
> v <- voom(IFNAR, design=dmat, block=IFNAR$samples$sample, correlation=corr$consensus, plot=T)

I first tried to correct the batch effect with limma duplicateCorrelation, however, it seems that from my MDS plot there is still a strong donor effect.

> plotMDS(v$E, label=IFNAR$samples$sample_details, col=as.numeric(IFNAR$samples$sample), cex=.8, main="Colored by donor (batch): After voom")

Then I tried below to explore the data, which is better but not optimal:

> IFNAR_batchcorrected <- removeBatchEffect(v$E, batch=IFNAR$samples$sample, design=model.matrix(~IFNAR$samples$sample_details))

I also tried with ComBat, which gives the best result, maybe because my sample size is small, it is better for an EB method:

> IFNAR_combat <- ComBat(v$E, batch=IFNAR$samples$sample, mod=model.matrix(~IFNAR$samples$sample_details, data=IFNAR$samples), prior.plots=T)

So I use combat adjusted data to continue my analysis.

> dmat <- model.matrix(~ 0 + sample_details, data=IFNAR$samples)

> vfit <- lmFit(IFNAR_combat, dmat)
> cfit <- contrasts.fit(vfit, cmat)
> efit <- eBayes(cfit)
> plotSA(efit)

The problem is when checking the residual-average plot, there is a strong trend between residual and average. I guess this is because the voom precision weights were lost after I merely put combat adjusted data into lmFIt.

So my question is whether I can simply using following code instead:

> v$E <- IFNAR_combat
> fit <- lmFit(v, dat)

Or is there any better suggestion ?

limma voom combat batch effect • 3.7k views

ADD COMMENT • link updated 9.0 years ago by Gordon Smyth 52k • written 9.0 years ago by eldronzhou ▴ 10

score 8 · Accepted Answer · 2016-07-24

As a general rule, no. If you give batch-corrected values to limma, it can't account for the uncertainty with which the batch effects were estimated and removed. This can distort the downstream inferences, leading to loss of type I error control. To illustrate, let's simulate a data set with an experimental set-up that looks like yours:

batches <- factor(rep(1:4, each=3))
treatment <- factor(rep(1:3, 4))
x <- matrix(rnorm(120000), ncol=length(batches))

We'll use standard normal observations for simplicity. Now, let's apply ComBat:

require(sva)
design <- model.matrix(~ treatment)
y <- ComBat(x, batches, mod=design)

... and then run it through limma:

require(limma)
fit <- lmFit(y, design)
fit <- eBayes(fit)
pvals <- topTable(fit, coef=2, n=Inf)$P.Value

The p-value distribution should be uniform, as there isn't any difference in the means between treatments 1 and 2 (both means are zero when all expression values are sampled from a standard normal). However, the observed type I error rate is:

sum(pvals <= 0.01)/length(pvals) # gives me 0.0753

... which is 7-fold higher than the specified threshold of 0.01. This is consistent with what I said above regarding the unmodelled uncertainty. In comparison, the recommended applications of limma (i.e., on uncorrected data) give:

dc <- duplicateCorrelation(x, design, block=batches)
fit2 <- lmFit(x, design, block=batches, correlation=dc$consensus)
fit2 <- eBayes(fit2)
pvals <- topTable(fit2, coef=2, n=Inf)$P.Value
sum(pvals <= 0.01)/length(pvals) # 0.0105

... or, if we were to block on batch in the design matrix:

design3 <- model.matrix(~ treatment + batches)
fit3 <- lmFit(x, design3)
fit3 <- eBayes(fit3)
pvals <- topTable(fit3, coef=2, n=Inf)$P.Value
sum(pvals <= 0.01)/length(pvals) # 0.0116

... which is a lot closer to what we specified. So, in general, I would let limma handle the batch correction. Incidentally, if you run plotMDS(y), you'll get libraries that are very well-separated on treatment. This is despite the fact that we did not simulate any treatment effect in this data, so there should be no separation on the treatment level at all.

score 5 · Accepted Answer · 2016-07-24

Your study is just a normal paired design and the correct analysis is just the usual standard approach:

> sample <- relevel(sample, ref="IFNAR_Mcl1_DKO_MEFs_Untreated")
> dmat = model.matrix(~ sample_details + sample, data=IFNAR$samples)
> v <- voom(IFNAR, design=dmat, plot=TRUE)
> fit <- lmFit(v, design)
> fit <- eBayes(fit, robust=TRUE)
> topTable(fit, coef=2)
> topTable(fit, coef=3)

There is no need for any of the complicated embellishments in your data analysis.

The MDS plot does not show a batch effect. It simply shows baseline expression differences between the donors (samples), which have already been fully taken into account in the analysis. That is what the 'sample' effect in the design matrix is for.

It is not correct to estimate a duplicateCorrelation for sample, because this factor is already in the design matrix.

It is not correct to use ComBat here, and it will give you false differential results, for reasons explained by Aaron.