Evaluation of diagnostic plot resulted from ComBat function regarding batch effect correction of a merged microarray dataset
Entering edit mode
svlachavas ▴ 780
Last seen 2 days ago
Germany/Heidelberg/German Cancer Resear…

Dear ALL,

in conjuction with one previous post about the correct implementation of ComBat() function for batch effect correction, [Appropriate implementation of ComBat function for known batch effect correction and alternative methodologies of merging microarray datasets]

i would like to ask for the interpretation of the resulting plot regarding the parametric approach, and how could i investigate it for my results. The link to the plot is below:


So, i understand that the black line represents the kernel estimate  of the empirical batch effect density, and the red the parametric estimate, but why there are two lines of plot ? In other words each line of two plots what represents ? 

Moreover, regarding the evaluation of the plot, mostly on the density plots, i could consider the parametric adjustment efficient or not ?

Finally, before implementing Combat, a standardization of my merged microarray dataset, could be considered beneficial for the parametric approach ? 

Please excuse me for any naive questions, but i have no experience with previous diagnostic plots regarding ComBat, and any feedback is highly appreciated !!!

ComBat diagnostic plot batch effect affymetrix microarrays sva • 1.3k views
Entering edit mode
Last seen 18 months ago
United States

Yes you are correct, the red (left plots) are the parametric estimates and the black lines are the kernel estimates for the distribution of effects across genes. The right is a Q-Q plot with the parametric estimate (red line) and the actual ordered batch effects for each gene (black points). The top plots are for the means, and the bottom are for the variances.   

For your case, I think you are fine using the parametric version of ComBat. Although there is some deviation (especially in the variances) in your case, the kernel and parametric versions will produce highly similar results. What you are really looking for here are extreme differences, say severe skewness or bimodality in the kernel that the parametric can't pick up. In your case I would posit that you will see differences less than 1-3% in the final adjusted data--which is unlikely to have any effect on your downstream analyses. 

Does this answer your questions?

Entering edit mode

Dear Evan, thank you for your answer !! Your explanation now makes things clear !! Thus, the up plots you mention for the mean and the down plots for the variance, display the total estimated batch effect for my genes across my two datasets, is that right ?

Finally, as the creator of the methodology, if it is possible--because the above plot it has been produced from some code posted in my previous post--as also some crusial questions posted there:

Appropriate implementation of ComBat function for known batch effect correction and alternative methodologies of merging microarray datasets

Your feedback also would be crusial to certify my whole approach with ComBat !!

Thank you in advance !!



Login before adding your answer.

Traffic: 293 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6