I am using ComBat to subtract a known batch effect from my RNA-Seq data, and I'm wondering how closely the empirical mean and variance distributions of my data should match the parametric prior in order to use par.prior=TRUE in the adjustment. Here's what the density and QQ plots look like for my data:
And the code can be found here: https://github.com/DarwinAwardWinner/CD4-csaw/blob/master/scripts/rnaseq-explore.Rmd
It looks like the mean distribution has fatter tails then the parametric prior, while the variance distribution is the opposite. There's also a hint of a second smaller mode to the right of the first in the variance distribution. My plot seems broadly similar to Figure 2 of the 2007 ComBat paper, which the text describes as "moderately reasonable". The text also mentions a second dataset for which a non-parametric prior was justified and refers to the supplementary materials. However, the supplement does not seem to be available on the Biostatsitics website, and I can't seem to find any traces of it elsewhere on the internet, so I can't compare my plot to the one for dataset 2.
So, is there any general guidance on how closely the empirical and fitted parametric distributions need to match in order for the parametric prior to be warranted?
(As a side-note, I just realized that the X and Y axes of the second QQ plot are swapped relative to the first. I've filed an issue for this: https://github.com/jtleek/sva-devel/issues/19.)