Entering edit mode
Mark W Kimpel
▴
830
@mark-w-kimpel-2027
Last seen 10.2 years ago
I have been using what I consider to be non-biased filtering of
low-variance genes using the method described in "Bioinformatics and
Computational Biology Solutions using R and Bioconductor", R.
Gentleman,
et al., page 233 for some time and have recently run into some
resistance from a colleague who claims that this type of filtering
distorts FDR calculations because it introduces bias. His reasoning is
that, since this method tends to filter out genes with higher p values
and/or lower fold changes, that it is sort of a sneaky way of
accomplishing just that. Of course, filtering by phenotype does
introduce bias, but in this case I believe that by filtering based on
the a priori assumption that we just aren't that interested in low
variance genes for biologic reasons (even if statistically significant
they will have very low fold changes and thus be of questionable
meaning) that we aren't violating the statistical underpinnings of the
analysis.
I need some help in justifying this filtering step. Does anyone know
of
a peer-reviewed reference that gives a theoretical justification for
its
use of of any empiric experiments that show that it is legit?
Thanks,
Mark
--
Mark W. Kimpel MD
Neuroinformatics
Department of Psychiatry
Indiana University School of Medicine