I am using DESeq2 to look for differential abundance in a set of KEGG pathways. It's a downstream analysis of 16s data. I know DESeq isn't designed for this, but other folks seem to use it, and who am I to not be a sheep?
I have 244 pathways and I get adjusted p-values for all but 29 of them. However, many of those 29 have raw pvalues in the range of features that made it into the significant adjusted group. My understanding (from the docs and numerous posts) is that the NAs come from DESeq2 viewing those features as containing potential outliers based on the Cook's distance. The recommendation is generally to (1) look at the actual counts to see if I as the user think the offending features may be destructive to the overall fit and (2) try adjusting the cooksCutoff
value, or eliminating it altogether, to see if that gives more adjusted pvalues.
I tried setting cooksCutoff=F
and still got 29 NA adjusted pvalues.
Here is are boxplots of the distributions for my samples (across pathways). The 2nd is for ONLY the features that get NA p-values. What aspect of these distributions suggest that DESeq2 can't assign p-values to certain features.
EDIT:
So, after consulting the link in the FAQ I see that independent filtering is another mechanism that can cause NA adjusted pvalues. If I list the rowmeans and padj I can see that the NAs do correspond to the smallest base means.
> pathway_counts %>%
+ mutate(RowMean = rowMeans(select(., sample_names))) %>%
+ inner_join(results_df, by="Pathway") %>%
+ select(Pathway, RowMean, padj=CaseString_AMD_vs_Control.padj) %>%
+ arrange(RowMean)
Pathway RowMean padj
1 ko00601 139.3209 NA
2 ko03450 139.3209 NA
3 ko00571 244.5821 NA
4 ko00944 277.7537 NA
5 ko04614 382.9328 NA
6 ko00100 701.2612 NA
7 ko04138 955.3582 NA
8 ko00364 1092.9030 NA
9 ko00565 1092.9030 NA
10 ko00572 1092.9030 NA
11 ko00623 1092.9030 NA
12 ko00965 1092.9030 NA
13 ko04622 1092.9030 NA
14 ko04934 1092.9030 NA
15 ko05020 1092.9030 NA
16 ko05100 1092.9030 NA
17 ko05142 1092.9030 NA
18 ko05143 1092.9030 NA
19 ko05211 1092.9030 NA
20 ko05219 1092.9030 NA
21 ko00643 1361.0821 NA
22 ko00981 1361.0821 NA
23 ko00909 1401.6493 NA
24 ko04080 1672.7761 0.25940036
25 ko04979 1672.7761 0.25940036
26 ko05166 1672.7761 0.25940036
27 ko01062 1793.2015 0.09070609
28 ko05146 2091.8582 0.43741052
29 ko00642 2184.9701 NA
30 ko00930 2184.9701 NA
31 ko00510 2297.8507 0.24833629
32 ko04011 2418.1119 NA
33 ko00361 2452.9851 NA
34 ko00791 2582.0821 NA
35 ko00311 2998.2239 0.12410029
36 ko04210 3008.5970 0.43127533
37 ko00591 3277.0597 0.15126360
38 ko00592 3277.0597 0.15126360
39 ko00072 3407.4179 0.35035016
40 ko04917 3407.4179 0.35035016
41 ko05014 3510.1940 0.53789646
42 ko00903 3545.0522 0.16057726
...
...
...
214 ko00630 132146.0597 0.07949129
215 ko03440 138930.9104 0.65923172
216 ko00680 141807.6567 0.38438155
217 ko01210 154234.7687 0.72965965
218 ko00720 155620.8433 0.25940036
219 ko00260 156124.5821 0.57029117
220 ko00030 159176.1119 0.36974438
221 ko00250 169957.7761 0.12410029
222 ko00270 175556.3955 0.07949129
223 ko00190 188362.8955 0.74061137
224 ko00052 192049.5597 0.25940036
225 ko00620 199290.0746 0.19715733
226 ko02060 200985.6269 0.11953936
227 ko02024 214841.2910 0.13306299
228 ko00051 221156.9403 0.25940036
229 ko00500 229149.2463 0.08760134
230 ko00010 253416.2537 0.13306299
231 ko00240 284979.7313 0.89616518
232 ko00520 291928.9328 0.34925892
233 ko02020 296378.1194 0.05075952
234 ko00230 356280.4179 0.81607963
235 ko03010 396684.1940 0.12603633
236 ko02010 447953.9328 0.04615051
237 ko01200 494053.2687 0.48229380
238 ko00970 561132.6194 0.63815481
239 ko01230 735971.0299 0.77203261
240 ko01120 896369.6642 0.12410029
241 ko01130 1111546.5075 0.98463176
242 ko01110 1444030.1791 0.43127533
243 ko01100 3267889.9552 0.69452160
However, the distribution of row means is pretty smooth and does not seem to have a low region that looks particularly as if it is somehow outside the general picture.
Finally, as instructed in the docs, if I set independentFiltering=F
then I get no NA adjusted pvalues.
Thanks! I did not know about the independent filtering setting.
Thanks! I did not know about the independent filtering setting.