I am trying to determine how best to manually set a cooksCutoff value for outlier detection, and have made the following observations:
0% 25% 50% 75% 90% 95% 99% 100%
0.00360426 0.08825488 0.21268380 0.73935583 2.970987 5.582727 13.544483 135.17274114
I've made a plot of maxCooks vs p-adj which I'm having trouble including in this post (but will continue trying to get it on here if someone more knowledgeable than I would like to take a look), but the graph essentially demonstrates that large maxCooks values do not necessarily correspond to low/significant p-adj values.
Additionally, if it makes any difference, my MA plot looks much different than what I think is to be expected. The vast majority of points, and colored points, are above the midline (me including this fact might just reveal how truly out of my depth I'm in).
When I inspect the number of significant genes I get, a lot of them seem to be influenced by the presence of a count outlier that seems to always be from the same sample. Is there a quantitative way to determine a cutoff, or am I left to eyeball it? Should I start by setting the cutoff to the top x% of maxCooks values and widdle away from there, or is there a more empirical way of approaching this? I'm pretty stumped, so any and all advice would be much appreciated.
Update: Setting a cooksCutoff value of 5 yields 3 significant genes that still seem to be influenced by a count outlier from the same sample