Question: Still lots of NA adjusted p-values in DESeq2 after eliminating Cook's distance cutoff.
0
gravatar for ariel
11 days ago by
ariel0
ariel0 wrote:

I am using DESeq2 to look for differential abundance in a set of KEGG pathways. It's a downstream analysis of 16s data. I know DESeq isn't designed for this, but other folks seem to use it, and who am I to not be a sheep?

I have 244 pathways and I get adjusted p-values for all but 29 of them. However, many of those 29 have raw pvalues in the range of features that made it into the significant adjusted group. My understanding (from the docs and numerous posts) is that the NAs come from DESeq2 viewing those features as containing potential outliers based on the Cook's distance. The recommendation is generally to (1) look at the actual counts to see if I as the user think the offending features may be destructive to the overall fit and (2) try adjusting the cooksCutoff value, or eliminating it altogether, to see if that gives more adjusted pvalues.

I tried setting cooksCutoff=F and still got 29 NA adjusted pvalues.

Here is are boxplots of the distributions for my samples (across pathways). The 2nd is for ONLY the features that get NA p-values. What aspect of these distributions suggest that DESeq2 can't assign p-values to certain features.

All pathway abundances by sample

only pathways with na pdj

EDIT:

So, after consulting the link in the FAQ I see that independent filtering is another mechanism that can cause NA adjusted pvalues. If I list the rowmeans and padj I can see that the NAs do correspond to the smallest base means.

> pathway_counts %>% 
+     mutate(RowMean = rowMeans(select(., sample_names))) %>% 
+     inner_join(results_df, by="Pathway") %>% 
+     select(Pathway, RowMean, padj=CaseString_AMD_vs_Control.padj) %>%
+     arrange(RowMean)
    Pathway      RowMean       padj
1   ko00601     139.3209         NA
2   ko03450     139.3209         NA
3   ko00571     244.5821         NA
4   ko00944     277.7537         NA
5   ko04614     382.9328         NA
6   ko00100     701.2612         NA
7   ko04138     955.3582         NA
8   ko00364    1092.9030         NA
9   ko00565    1092.9030         NA
10  ko00572    1092.9030         NA
11  ko00623    1092.9030         NA
12  ko00965    1092.9030         NA
13  ko04622    1092.9030         NA
14  ko04934    1092.9030         NA
15  ko05020    1092.9030         NA
16  ko05100    1092.9030         NA
17  ko05142    1092.9030         NA
18  ko05143    1092.9030         NA
19  ko05211    1092.9030         NA
20  ko05219    1092.9030         NA
21  ko00643    1361.0821         NA
22  ko00981    1361.0821         NA
23  ko00909    1401.6493         NA
24  ko04080    1672.7761 0.25940036
25  ko04979    1672.7761 0.25940036
26  ko05166    1672.7761 0.25940036
27  ko01062    1793.2015 0.09070609
28  ko05146    2091.8582 0.43741052
29  ko00642    2184.9701         NA
30  ko00930    2184.9701         NA
31  ko00510    2297.8507 0.24833629
32  ko04011    2418.1119         NA
33  ko00361    2452.9851         NA
34  ko00791    2582.0821         NA
35  ko00311    2998.2239 0.12410029
36  ko04210    3008.5970 0.43127533
37  ko00591    3277.0597 0.15126360
38  ko00592    3277.0597 0.15126360
39  ko00072    3407.4179 0.35035016
40  ko04917    3407.4179 0.35035016
41  ko05014    3510.1940 0.53789646
42  ko00903    3545.0522 0.16057726
...
...
...
214 ko00630  132146.0597 0.07949129
215 ko03440  138930.9104 0.65923172
216 ko00680  141807.6567 0.38438155
217 ko01210  154234.7687 0.72965965
218 ko00720  155620.8433 0.25940036
219 ko00260  156124.5821 0.57029117
220 ko00030  159176.1119 0.36974438
221 ko00250  169957.7761 0.12410029
222 ko00270  175556.3955 0.07949129
223 ko00190  188362.8955 0.74061137
224 ko00052  192049.5597 0.25940036
225 ko00620  199290.0746 0.19715733
226 ko02060  200985.6269 0.11953936
227 ko02024  214841.2910 0.13306299
228 ko00051  221156.9403 0.25940036
229 ko00500  229149.2463 0.08760134
230 ko00010  253416.2537 0.13306299
231 ko00240  284979.7313 0.89616518
232 ko00520  291928.9328 0.34925892
233 ko02020  296378.1194 0.05075952
234 ko00230  356280.4179 0.81607963
235 ko03010  396684.1940 0.12603633
236 ko02010  447953.9328 0.04615051
237 ko01200  494053.2687 0.48229380
238 ko00970  561132.6194 0.63815481
239 ko01230  735971.0299 0.77203261
240 ko01120  896369.6642 0.12410029
241 ko01130 1111546.5075 0.98463176
242 ko01110 1444030.1791 0.43127533
243 ko01100 3267889.9552 0.69452160

However, the distribution of row means is pretty smooth and does not seem to have a low region that looks particularly as if it is somehow outside the general picture.

enter image description here

Finally, as instructed in the docs, if I set independentFiltering=F then I get no NA adjusted pvalues.

deseq2 • 53 views
ADD COMMENTlink modified 10 days ago • written 11 days ago by ariel0
Answer: Still lots of NA adjusted p-values in DESeq2 after eliminating Cook's distance c
2
gravatar for Michael Love
11 days ago by
Michael Love26k
United States
Michael Love26k wrote:

Check the FAQ — there is more information on the NAs.

ADD COMMENTlink written 11 days ago by Michael Love26k

Thanks! I did not know about the independent filtering setting.

ADD REPLYlink written 10 days ago by ariel0

Thanks! I did not know about the independent filtering setting.

ADD REPLYlink written 10 days ago by ariel0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 145 users visited in the last hour