I'm not sure if this is a question of outliers, since it is happening with more than one sample in a group, but I am seeing genes coming back as being differentially expressed when they are only obviously different in 3-4 samples out of 16 total samples in a group compared to 18 samples in a control group. I am using DESeq2 with default settings. Is there a way to change the settings in DESeq to prioritize genes that show similar expression within a group? I want to find differentially expressed genes that are different for most samples within a group instead of being different for about 1/4 of the samples within a group.
Thanks for the quick reply!
Here is an example...
3 samples from the 'yes' group are clearly skewing the results. This gene has a padj of 0.001, a LFC of 1.22 before shrinking and 1.17 LFC after shrinking.
Most of our DE genes do not have very large fold changes and we are dealing with very noisy human data.
I don't think these results are obviously being skewed by the 3 highest samples in the "yes" group. Even if you ignore these, the "yes" group still has a higher average normalized count than the "no" group. It might not be as significant without the 3 highest samples, but I wouldn't say that this gene is an unambiguous false positive.