3.1 years ago by
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
The default cutoff of sqrt(2) is chosen to agree with the roast() function. When the statistic is chosen to be the z-score equivalent of a moderated t-statistic, then the genes falling in the coloured regions in the barcodeplot will be the same genes that roast() counts when it calculates the proportion of genes contributing to the up and down p-values for the test set.
Why does roast() use sqrt(2)? This is based on an Akaike Information Criterion (AIC) argument. Suppose that you observe a test statistic z for assessing DE for a given gene. Suppose you use AIC to choose between the null model Z ~ N(0,1) and the alternative model Z ~ N(mu, 1), where mu is a parameter to be estimated. Then you will choose the more complex model if and only if abs(z) > sqrt(2).
Note that, in the gene set testing context, we can get a significant result for the gene set even when the genes in the set are not individually significant. When we count genes contributing to a significant result, we want to include all the genes that seem more likely to be DE than not, hence the AIC argument. From this point of view, a p-value of 0.15 is quite acceptable.
Having said all that, the colouring used by barcodeplot() is only intended to be a guide. There is no reason that you can't set the colour cutoff differently for your own problem if you find that more helpful.