Question: Very high threshold for independent filtering
0
gravatar for thomas.deimel
7 weeks ago by
thomas.deimel0 wrote:

Experimental Set-up: I am analysing an observational data set (i.e. no randomisation to condition groups) consisting of a couple of patient variables (lab values, etc.) and RNA-Seq data for miRNAs. I am trying to identify differentially expressed miRNAs for certain (dichotomised) variables while controlling for others, e.g. formula: ~ cov1 + cov2 + variableofinterest.

Strange Observation: For some of my variables of interest, a surprisingly large number of genes are filtered out in independent filtering. I have checked that the NA p-values are not due to all-zero counts or outlier exclusion. As can be seen from the example plot below, the threshold for filtering out genes is set quite high (>75 %-quantile of mean of normalised counts) and there is a pretty sharp rise in number of H0 rejections at that point. From the histogram of p-values it seems that most of the non-signif genes are filtered out - but the general pattern (though very high in terms of number of filtered genes) seemed ok to me.

My questions are:

1) Is there a point at which filtering out too many genes could lead to a non-acceptable increase in type-I error rate? I.e., is there a limit to how far one can go with independent filtering before the paradigm of increasing sensitivity without getting too many false-positives breaks down?

2) In the "filtering threshold-selection plot", there are some local minima/maxima and the fit deviates quite a bit from the "oscillating" observed data points. Is any of this concerning (other than affecting the setting of the threshold by increasing the residual standard deviation that is subtracted from the fit's peak when setting the cut-off -- if I have understood that part correctly)? Any ideas why the plot might look like this at all?

Code used to create the plots is essentially just copied from the DESeq2 vignette. Please let me know if there is any other information you would like me to provide

Plots:

https://www.dropbox.com/s/5cfz8g9bppzaql5/indepfilteringex.pdf?dl=0

https://www.dropbox.com/s/21y1uv8h3merr8l/Bildschirmfoto%202019-03-01%20um%2014.38.29.png?dl=0

ADD COMMENTlink modified 7 weeks ago by Michael Love23k • written 7 weeks ago by thomas.deimel0
Answer: Very high threshold for independent filtering
0
gravatar for Michael Love
7 weeks ago by
Michael Love23k
United States
Michael Love23k wrote:

So, the way I changed the IF routine in DESeq2 was to smooth the curve and take the filter threshold that gets within "noise" range of the maximum. This helps to mitigate some of the stochasticity problems of the greedy procedure. But meanwhile, if you want a more principled approach, why not use IHW which was designed to address the greedy IF procedure:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#independent-hypothesis-weighting

ADD COMMENTlink written 7 weeks ago by Michael Love23k

Thanks for the swift response - I will have a look into IHW (and might come back with follow-up questions once I have a better understanding)!

ADD REPLYlink written 7 weeks ago by thomas.deimel0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 261 users visited in the last hour