Question: Using of limma moderated t-test with "corrected" expression matrix resulting from ComBat batch effect correction
2
3.8 years ago by
svlachavas660
Greece/Athens/National Hellenic Research Foundation
svlachavas660 wrote:

Dear Bioconductor Community,

based on the very interesting question on a previous post (batch effect : comBat or blocking in limma ?) regarding the possible batch correction methodologies, through the answers created i desided to adress a very important issue in my opinion-as im still a newbie in these specific statistics-which i also dont have find any relevant explanation in any according paper of MOOC. In detail, Dr Johnson kindly mentioned that limma should not be used in conjuction with ComBat, when the latter applied for batch effect correction. With all respect(and of course without arguing against), i would like to ask some further explanations for this specific matter:

this could be more context and experimental design specific ?? that is, it depends on the magnitude of the batch effect, the variances between the batches, or also the experimental design used in Limma itself(i.e. paired or unpaired comparison) ? Or should someone always avoid it ?  And if so, what is the main notion ? (It has to do with the "correlation" mentioned in the above post and possibly with the overestimation of DE ?)

Please excuse me for creating this post, but i believe this question is very important and i would like to get the maximum feedback, because i would like to know (as i used a combination of them in one of my recent analyses) if "still" could the combination considered "valid" under certain circumstances !!

Thank you for your consideration on this matter !!

Efstathios

modified 3.8 years ago by W. Evan Johnson800 • written 3.8 years ago by svlachavas660
Answer: Using of limma moderated t-test with "corrected" expression matrix resulting fr
4
3.8 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

As a general rule, it is not correct to batch correct before a differential expression analysis. This will result is overstatement of statistical significance. Rather the batch correction should be built into the DE analysis.

This principle is doesn't depend on comBat, or limma, or on the statistical design. It's a general principle.

The only exception would be when the batch correction can be estimated from different data to that used for the DE analysis. That would be unusual however.

Dear Gordon,

thank you for your crusial explanation-just to pinpoint an example-one thing that bothers me though-, regarding some older conversation about a specific analysis that both you and Aaron kindely provided explanations(C: Correct creation of design matrix in limma regarding a multi-level experiment)- for this specific analysis, i compared the DE genes after Combat with limma(without including the batch effect variable, just the pairs factor) and one without running ComBat but including in limma the batch effect variable along with the pairs(although Aaron have indicated that this would be redudant-Merge different datasets and perform differential expression analysis in limma)---so, as only a small group of genes(40) are different when i compare the two lists(this with the batch effect correction has 37 more DE genes), This could provide a "relative" confidence about the "above general principle" of the overestimation of statistical significance you underline ?

Efstathios

3

One good way to see the overestimation of significance in action would be to simulate a dataset where all the null hypotheses are true, both with and without an added batch effect. Then analyze both the datasets with all the methods you are considering and compare the resulting p-values to see if any method is systematically over-conservative or over-liberal. A well-behaved method should give a completely uniform (i.e. flat) p-value distribution.

Dear Dr Thompson,

thank you for idea !! Firstly, regarding my above observation i mention could provide even a small "confidence" about batch effect overestimation ?

Secondly, as im using R about a year and im "relatively" new in statistics:

1. Regarding the simulation you mean, you mean something like permuting the labels? and leave the gene expression as it is, while create two cases: a) take the batch corrected expression set and permute the labels and continue b) the same without correcting for batch effect ?

Please excuse me if my question is naive or irrelevant but i havent performing so far any "simulation" you mention !!

Best,

Efstathios

1

I would be hesitant to draw conclusions about differences in performance between methods based on data where you don't already know the answer. That's why I recommended simulating a dataset from scratch, so you can control all the variables and you will know exactly which simulated genes are DE and which are not. This allows you do draw ROC curves and such for each method to evaluate their relative performance.

If you want to pursue this, there are a number of "benchmark" papers that give examples of how to do such a simulation. I don't have one handy at the moment, but you should be able to find one on PubMed.

I see. Then in simple "naive" words, simulate 2 datasets, one from the batch corrected dataset and the other from the same dataset without a batch effect correction, right ? I will search extensively for papers, with a small search i have found

http://www.researchgate.net/publication/255702173_A_Flexible_Microarray_Data_Simulation_Model

Maybe i could also use the below tools from Mr Vegard if they can assist in evaluating this specific issue

Yes, I'm not familiar with that particular method, but it looks like a reasonable simulation method. They also provide an R package, which certainly helps.

Efstathios, I'm not quite sure what is bothering you or what you mean by "relative confidence".

For your paired experiment, you don't even need a batch correction. The correct analysis is also the simplest.

Other analyses are more complicated and are either not needed or are theoretically wrong.

Dear Gordon,

because i have not such great experience in these issues, i was more "affected" from literature agumenting the necessity when there are two or more different datasets, batch effect correction is essential for merging !! Thus, i was afraid that the pairs factor could not handle a batch effect resulting from both studies !! Nevertheless, as your opinion and your consideration on this matter are trustful and crusial, i will perform the same analysis without batch effect correction and only including the pairs--and inspect and possibly stick with these results.

Answer: Using of limma moderated t-test with "corrected" expression matrix resulting fr
2
3.8 years ago by
Norway
Vegard Nygaard110 wrote:

Dear Efstathios

I see that you have many thoughts and questions regarding the same things my colleagues and I had when working with the article mention in the above referred post:
"Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses" http://biostatistics.oxfordjournals.org/content/early/2015/08/31/biostatistics.kxv027.full.pdf

There are 4 different ways of handling batch effect using ComBat and/or limma (or any batch effect correction tool combined with a DE-analysis tool).
1. No batch correction + limma with batch
2. Batch correction + limma with batch
3. Batch correction + limma without batch
4. No batch correction + limma without batch
In our paper we recommend 1. Approach 2 is mentioned as a possibility both by Dr. Johnson (I if understood him correctly) in the above mentioned previous post and in the PARTEK user guide(a commercial software with a similar pair of tools). 1 and 2 should give pretty similar results for most situations. For a fairly balanced experimental design, approach 3 would also give quite similar results. Approach 4 is disregarding the batch effect altogether. 3 and 4 are the problematic ones which can lead to false positive or false negative findings, depending on the experimental design, real group effects, batch correction method used (group as covariate or not) and probably other things. If you have used 3 or 4 in an analysis, the most rigorous path is to reanalyze with approach 1 (or equally 2).

Assessing the adverse effect of approach 3 under different settings was something we spent considerable time on. And as aides in doing so, we made a couple of shiny-tools, "batch-adjustment-simulator" and "fdist-app", which are available via our additional analyses page at GitHub;

Regarding the correlation introduced by a batch effect correction tool, a practical way is to consider this a batch effect as well. The original batch effect is replaced by the batch effect estimation errors. So, if you start out with a batch effect, then correct for batch, you still have a batch effect in your data. Hopefully smaller, maybe not, anyways potentially troublesome. When using analysis tools were the batch effect can not be included (unlike limma), correcting for batch before applying is often the best hope for a more reliable result, but no guarantee.

Vegard.

Hi Vegard, thanks for the breakdown.  So I'm still a bit confuse on #2.  You mentioned "Batch correction + limma with batch" are you implying to use the corrected matrix but running the model with the batch as a covariate?

So for example,  I have an experiment (Drug vs. control) and did this experiment on 3 separate cell lines and mainly care about the effect of drug.  Can I run combat to correct for cell line and then use the output of the matrix in my model correcting for cell line?

1. correct for batch (cell line)
2. design <-  model.matrix(~0+ key$group + key$cell  )
3.  lmFit( corrrected.matrix , design)

would this work?

Ahdee

Answer: Using of limma moderated t-test with "corrected" expression matrix resulting fr
1
3.8 years ago by
United States
W. Evan Johnson800 wrote:

Sorry to put my input in a little late on this one. While I agree with most of what has been said here, there are a few things I would like to point out:

I respectfully disagree with Gordon on the notion that it "is not correct to correct for batch correct before differential expression." What is true are the following:

1) Blocking with Limma is the easiest way to adjust for batch effects, especially when the batch effects are not severe.

2)  It is not correct to use ComBat and then assume the data are independent in any downstream analyses. However, if you properly account for the correlation (limma with batch in the design, weighted least squares, etc), its completely fine to use a two step approach and ComBat, even in differential expression cases.

3) ComBat treats batch differences in the mean and variance as random effects. ComBat shrinks batch effect estimates across genes to get more robust estimates, especially in small batches. Blocking in limma will treat batch as a fixed effect (no shrinkage), and does not account for heterozygosity in batch variances. The latter can be addressed if you add an extra step (arrayWeights function), but again the variance differences are assumed to be fixed effects. Because I strongly believe believe that batch effects are systematic in many cases, ComBat often provides a more appropriate model for batch effects, especially when they are severe.

Thanks for this discussion. I think overall it has been helpful.

Dear Dr Johnson,

your detailed comments are also very important to take into consideration and also consider all the possible pros and cons of each methodogy, especially the batch correct implementation with statistical analysis. Without wanting to post many comments, two small questions explicitly on your above answer(and of course taking into account our last discussion about simulation):

1. Regarding the question number 2- can the pairs factor implemented in my analysis(included in the design matrix after Combat) to account somehow for the "correlation you mentioned above" ? Because, as Mr. Thompson posted just above, including also the batch effect variable into the design matrix, along with the pairs factor is redudant and needs special treatment(duplicateCorrelation), which might gets the things uneccessary complicated.

2.)  Regarding your answer number 3): In a subsequent analysis with the assistance of Mr Aaron Lun i used an interaction model in my analysis(i had one more factor called location), and before the design matrix i used arrayWeights. But this was to compare each specific tumor location vs its adjucent samples and not the general paired comparison discussing above

Best,

Efstathios

Answer: Using of limma moderated t-test with "corrected" expression matrix resulting fr
0
3.8 years ago by
svlachavas660
Greece/Athens/National Hellenic Research Foundation
svlachavas660 wrote:

Dear Vegard,

thank your your reply and your detailed answer !! I will definatelly read the paper-Firstly, before creating this post, i asked in the above post you mention and which also discusses the above parer: C: batch effect : comBat or blocking in limma ? -then I created this post, because i had general questions about this specific topic(Mr. Smyth above kindly provided an explanation)-especially with one specific analysis i performed recently was a bit complicated. In detail, i had two separate colorectal cancer datasets[hgu133a dataset-26 paired samples-13 patients & hgu133plus2 dataset-34 samples-17 patients] which have common phenotype information. Thus, i merged the two datasets based on their common probesets(after i have normalized each separately with custom Brain array CDFs), while have some options for batch effect correction, from which i used limma. After, because the samples are paired(each patient has a cancer vs  adjucent control sample), i created a paired limma analysis, but without the batch-as Mr. Aaron Lun have mentioned a very important issue: "This is because any differences between batches will be absorbed into the differences between pairs, i.e., the effect of the different chip will be included in the estimate of the patient-specific effect. Inclusion of batch would be redundant and result in a design matrix that is not of full rank, as we can't distinguish between effect of the patient and that of the chip". Thus, i didnt include the batch factor in my desing matrix. In other words, maybe this is different from the approach number 3 you mention above, as is this specific context my analysis is paired ?

Now, as i have underlined also above, i also merged the two datasets without batch effect correction regarding the different study-and then performed limma but with the batch variable also in the design matrix (approach number 1 above), and the results(with some "arbitarty" cutoffs used also in my above procedure) are quite similar only 40 genes less than my first implementation(and the majority of my genes are common). This maybe could also provide some "relative confidence" about the general issue with batch effect correction.

On the other hand, because my analysis is paired, i could also use without batch effect correction and only including the pairs in my design matrix

Finally, with your above github link, i can "inspect" somehow and check a relative "performance" of my procedures ??

Best,

Efstathios

It seems that your experimental design takes care of the batch effect. Be aware that this is not only because it is paired, but because all your pairs are inside the same batch (which I am not sure of!). If the two samples had been in different batches, you should have included the batch in limma.

The interactive tools at our github simulate simpler designs, not paired like yours.

Dear Vegard,

before merging, each dataset compised of its paired samples- thus when merging, each dataset is considered as a "batch" for the correction-that is the different study-naively- the common between both datasets is that the different patients included in each dataset, have their samples paired(along with the common probesets, based on which the merging is performed). Now regarding the batch variable in Limma, what happens is what i mention above(and Aaron underlined): i get partial NA coefficients, because the total pairs are 30(60 samples), and along with not only the disease variable for the comparison(cancer vs normal), but also with the additional inclusion of batch factor indicating different studies, it gets messy and redundant.

So, my specific experimental design canot be used in your github tools for any "evaluation" metrics correctly ??

1

Since your pairs are nested within the batches, including the pairs in your design also adjusts for the batch effect as well, so including the batch effect is redundant. This is why you're getting partial NA coefficients. You should never include both a nested factor (pairs) and the factor that it is nested inside (batch) in the same design. It's not possible to separate the batch effects from the inter-patient effects using this design, but they will both be corrected for when testing the main effect. If you wanted to analyze the batch effect directly, you would need to treat the pairs as a random effect using duplicateCorrelation.

Dear Ryan,

thank you for your explanation. So the inclusion of batch is redundant as far i have paired samples and include them as in the design matrix.