Question

Why did I get NA p-values in DESeq with continuous variable?

0

Entering edit mode

ariel ▴ 20

@ariel-16886

Last seen 2.4 years ago

United States

I have a microbiome study with counts for various microbiota. It is a patient case/control study for a disease. One of my variables is the patients' ages. Given the nature of the study, it would not be appropriate to bin the ages as recommended in the DESeq vignette. I'm running deseq with the age as a continuous variable. I end up with a bunch of really high p-values, and NAs for the rest. Strikingly, the microbiota we know to be most involved in the disease are among those with the NAs.

The vignette also discusses reasons for getting NA p-values. But none of those reasons seem to apply.

Is there a way I can run deseq, perhaps with different parameters, that would give me meaningful p-values?

Here is a snippet of what my data looks like.

Taxa Counts

              short_glommed_taxa s001 s002 s003
1 Actinobacteria_Bifidobacterium    0  244    0
2     Actinobacteria_Collinsella    0  206  665
3      Bacteroidetes_Bacteroides 9582 3283 8736
4      Bacteroidetes_Odoribacter    0    0  205
5   Bacteroidetes_Paraprevotella    0  264  764

Metadata

  Case  Age Gender
1    1 78.3      M
2    1 83.5      F
3    1 86.0      M
4    1 81.3      F
5    1 61.3      F

Design = ~ Case + Gender

    pvalue.Case pvalue.Gender
1  4.555975e-02  6.252092e-01
2  6.171859e-01  8.941688e-01
3  3.601969e-01  5.478673e-01
4  1.285063e-01  4.650333e-02
5  2.634906e-01  1.858226e-01
6  7.824646e-05  6.250564e-01
7  8.877039e-01  1.913021e-01
8  8.905339e-01  3.452605e-01
9  2.584022e-01  3.101290e-01
10 2.250264e-02  1.386952e-03
11 3.558656e-01  4.911188e-01
12 6.603468e-06  6.197274e-04
13 8.266107e-01  8.104814e-01
14 1.801850e-01  6.569023e-01
15 4.003128e-02  1.597909e-02
16 2.181109e-01  7.052880e-01
17 4.950991e-01  7.803954e-01
18 7.889134e-01  8.771824e-01
19 8.586993e-01  6.271034e-01
20 5.560399e-01  6.613306e-01
21 2.826663e-02  1.297668e-02
22 7.808729e-01  1.669244e-01
23 3.884482e-01  4.481712e-01
24 6.010148e-01  5.161258e-01
25 8.124504e-01  9.521962e-01
26 8.395784e-01  1.377468e-01
27 6.123660e-01  4.371106e-01
28 4.775619e-01  5.534327e-01
29 4.758037e-01  4.360346e-01
30 9.816497e-01  6.690608e-01
31 4.208371e-03  1.943810e-02
32 2.959725e-01  2.967400e-02
33 4.078783e-01  5.551998e-02
34 3.245609e-01  7.898801e-03
35 1.218701e-01  9.434043e-01
36 1.665705e-01  9.412286e-01
37 1.260214e-01  1.145780e-01
38 5.076193e-01  8.684723e-01
39 3.536274e-01  2.365202e-01
40 4.832064e-01  4.993375e-02
41 3.188982e-01  7.723663e-01
42 4.323778e-02  8.952457e-01
43 5.394069e-02  5.341772e-01
44 9.893074e-01  7.800427e-01
45 5.770063e-01  6.617429e-01
46 9.388728e-01  3.087342e-04
47 9.190019e-05  1.715437e-02
48 9.352599e-05  5.574272e-10
49 7.939812e-02  9.102021e-01
50 5.709937e-01  8.088925e-01
51 7.036197e-01  1.000127e-01
52 7.946552e-01  1.476702e-01
53 2.783126e-02  8.279938e-01
54 7.470688e-04  4.553144e-09
55 9.979718e-02  3.360789e-01
56 8.583612e-02  1.835828e-01
57 7.049268e-01  5.121603e-01

Design = ~ Case + Age + Gender

Results

    pvalue.Case pvalue.Age pvalue.Gender
1           NA         NA            NA
2   0.97016393 0.42725117   0.924426283
3   0.38669470 0.99390665   0.540866442
4   0.33058584 0.34905276   0.041786061
5   0.89452103 0.96785761   0.913534339
6           NA         NA            NA
7   0.72057104 0.56496672   0.257152766
8           NA         NA            NA
9   0.91214072 0.69744511   0.865934511
10          NA         NA            NA
11          NA         NA            NA
12  0.91231087 0.70569817   0.898211558
13  0.83010922 0.29578480   0.611334401
14  0.24399073 0.87013860   0.664381008
15  0.02011576 0.28959588   0.009174773
16  0.76316076 0.23545235   0.702478530
17  0.16144017 0.19556132   0.775387805
18  0.83389237 0.93763459   0.852481463
19          NA         NA            NA
20  0.65189161 0.81202443   0.677908670
21  0.01361814 0.04499905   0.009273717
22  0.80023650 0.99132911   0.171179127
23  0.20656657 0.34127935   0.575875139
24  0.64055736 0.73743428   0.473879064
25  0.90205610 0.38099752   0.908197158
26  0.96560037 0.70496297   0.892167815
27  0.55212646 0.02653155   0.781172815
28  0.95663939 0.86534807   0.826079812
29          NA         NA            NA
30  0.91316218 0.77446577   0.703892102
31  0.63790560 0.76312134   0.767167910
32  0.93338707 0.68613435   0.857710827
33          NA         NA            NA
34          NA         NA            NA
35  0.28482905 0.44575985   0.860661029
36          NA         NA            NA
37          NA         NA            NA
38          NA         NA            NA
39  0.07686628 0.12284277   0.142249987
40          NA         NA            NA
41  0.15447232 0.41656887   0.859062725
42          NA         NA            NA
43  0.04977070 0.63968873   0.485043175
44  0.68316696 0.35215020   0.604236454
45          NA         NA            NA
46          NA         NA            NA
47  0.87825478 0.97856553   0.877739962
48  0.71930414 0.80550122   0.667949742
49          NA         NA            NA
50          NA         NA            NA
51  0.95173391 0.61814645   0.805816314
52  0.78874298 0.87486492   0.156943139
53  0.66491304 0.93538092   0.794474220
54          NA         NA            NA
55          NA         NA            NA
56  0.86896850 0.85821162   0.878294542
57  0.81796188 0.16375175   0.339068198

Code Example

> taxa_counts %>% select(s001,s002,s003, s004) %>% head(5)
                                                                                   s001 s002
Actinobacteria_Actinobacteria_Bifidobacteriales_Bifidobacteriaceae_Bifidobacterium    1  244
Actinobacteria_Coriobacteriia_Coriobacteriales_Coriobacteriaceae_Collinsella          1  206
Bacteroidetes_Bacteroidia_Bacteroidales_Bacteroidaceae_Bacteroides                 9582 3283
Bacteroidetes_Bacteroidia_Bacteroidales_Marinifilaceae_Odoribacter                    1    1
Bacteroidetes_Bacteroidia_Bacteroidales_Prevotellaceae_Paraprevotella                 1  264
                                                                                   s003 s004
Actinobacteria_Actinobacteria_Bifidobacteriales_Bifidobacteriaceae_Bifidobacterium    1    1
Actinobacteria_Coriobacteriia_Coriobacteriales_Coriobacteriaceae_Collinsella        665    1
Bacteroidetes_Bacteroidia_Bacteroidales_Bacteroidaceae_Bacteroides                 8736 7985
Bacteroidetes_Bacteroidia_Bacteroidales_Marinifilaceae_Odoribacter                  205    1
Bacteroidetes_Bacteroidia_Bacteroidales_Prevotellaceae_Paraprevotella               764 1353
> dds <- DESeqDataSetFromMatrix(
+     countData=taxa_counts,
+     colData=metadata,
+     design=~Case+Gender
+ )
converting counts to integer mode
> deseq_obj = DESeq(
+     dds,
+     test='Wald',
+     fitType='mean'
+ )
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 9 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
> case_results = results(deseq_obj, contrast=c('Case',0,1))
> gender_results = results(deseq_obj, contrast=c('Gender', 'M', 'F'))
> data.frame(pvalue.case=case_results$pvalue, pvalue.gender=gender_results$pvalue)
    pvalue.case pvalue.gender
1  4.555975e-02  6.252092e-01
2  6.171859e-01  8.941688e-01
3  3.601969e-01  5.478673e-01
4  1.285063e-01  4.650333e-02
5  2.634906e-01  1.858226e-01
6  7.824646e-05  6.250564e-01
7  8.877039e-01  1.913021e-01
8  8.905339e-01  3.452605e-01
9  2.584022e-01  3.101290e-01
10 2.250264e-02  1.386952e-03
11 3.558656e-01  4.911188e-01
12 6.603468e-06  6.197274e-04
13 8.266107e-01  8.104814e-01
14 1.801850e-01  6.569023e-01
15 4.003128e-02  1.597909e-02
16 2.181109e-01  7.052880e-01
17 4.950991e-01  7.803954e-01
18 7.889134e-01  8.771824e-01
19 8.586993e-01  6.271034e-01
20 5.560399e-01  6.613306e-01
21 2.826663e-02  1.297668e-02
22 7.808729e-01  1.669244e-01
23 3.884482e-01  4.481712e-01
24 6.010148e-01  5.161258e-01
25 8.124504e-01  9.521962e-01
26 8.395784e-01  1.377468e-01
27 6.123660e-01  4.371106e-01
28 4.775619e-01  5.534327e-01
29 4.758037e-01  4.360346e-01
30 9.816497e-01  6.690608e-01
31 4.208371e-03  1.943810e-02
32 2.959725e-01  2.967400e-02
33 4.078783e-01  5.551998e-02
34 3.245609e-01  7.898801e-03
35 1.218701e-01  9.434043e-01
36 1.665705e-01  9.412286e-01
37 1.260214e-01  1.145780e-01
38 5.076193e-01  8.684723e-01
39 3.536274e-01  2.365202e-01
40 4.832064e-01  4.993375e-02
41 3.188982e-01  7.723663e-01
42 4.323778e-02  8.952457e-01
43 5.394069e-02  5.341772e-01
44 9.893074e-01  7.800427e-01
45 5.770063e-01  6.617429e-01
46 9.388728e-01  3.087342e-04
47 9.190019e-05  1.715437e-02
48 9.352599e-05  5.574272e-10
49 7.939812e-02  9.102021e-01
50 5.709937e-01  8.088925e-01
51 7.036197e-01  1.000127e-01
52 7.946552e-01  1.476702e-01
53 2.783126e-02  8.279938e-01
54 7.470688e-04  4.553144e-09
55 9.979718e-02  3.360789e-01
56 8.583612e-02  1.835828e-01
57 7.049268e-01  5.121603e-01
> dds <- DESeqDataSetFromMatrix(
+     countData=taxa_counts,
+     colData=metadata,
+     design=~Case+Age+Gender
+ )
converting counts to integer mode
> deseq_obj = DESeq(
+     dds,
+     test='Wald',
+     fitType='mean'
+ )
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> case_results = results(deseq_obj, contrast=c('Case',0,1))
> gender_results = results(deseq_obj, contrast=c('Gender', 'M', 'F'))
> age_results = results(deseq_obj, name='Age')
> data.frame(pvalue.case=case_results$pvalue, pvalue.gender=gender_results$pvalue, pvalue.age=age_results$pvalue)
   pvalue.case pvalue.gender pvalue.age
1           NA            NA         NA
2   0.97016393   0.924426283 0.42725117
3   0.38669470   0.540866442 0.99390665
4   0.33058584   0.041786061 0.34905276
5   0.89452103   0.913534339 0.96785761
6           NA            NA         NA
7   0.72057104   0.257152766 0.56496672
8           NA            NA         NA
9   0.91214072   0.865934511 0.69744511
10          NA            NA         NA
11          NA            NA         NA
12  0.91231087   0.898211558 0.70569817
13  0.83010922   0.611334401 0.29578480
14  0.24399073   0.664381008 0.87013860
15  0.02011576   0.009174773 0.28959588
16  0.76316076   0.702478530 0.23545235
17  0.16144017   0.775387805 0.19556132
18  0.83389237   0.852481463 0.93763459
19          NA            NA         NA
20  0.65189161   0.677908670 0.81202443
21  0.01361814   0.009273717 0.04499905
22  0.80023650   0.171179127 0.99132911
23  0.20656657   0.575875139 0.34127935
24  0.64055736   0.473879064 0.73743428
25  0.90205610   0.908197158 0.38099752
26  0.96560037   0.892167815 0.70496297
27  0.55212646   0.781172815 0.02653155
28  0.95663939   0.826079812 0.86534807
29          NA            NA         NA
30  0.91316218   0.703892102 0.77446577
31  0.63790560   0.767167910 0.76312134
32  0.93338707   0.857710827 0.68613435
33          NA            NA         NA
34          NA            NA         NA
35  0.28482905   0.860661029 0.44575985
36          NA            NA         NA
37          NA            NA         NA
38          NA            NA         NA
39  0.07686628   0.142249987 0.12284277
40          NA            NA         NA
41  0.15447232   0.859062725 0.41656887
42          NA            NA         NA
43  0.04977070   0.485043175 0.63968873
44  0.68316696   0.604236454 0.35215020
45          NA            NA         NA
46          NA            NA         NA
47  0.87825478   0.877739962 0.97856553
48  0.71930414   0.667949742 0.80550122
49          NA            NA         NA
50          NA            NA         NA
51  0.95173391   0.805816314 0.61814645
52  0.78874298   0.156943139 0.87486492
53  0.66491304   0.794474220 0.93538092
54          NA            NA         NA
55          NA            NA         NA
56  0.86896850   0.878294542 0.85821162
57  0.81796188   0.339068198 0.16375175

deseq2 p-value continuous deseq • 1.6k views

ADD COMMENT • link 5.2 years ago ariel ▴ 20

score 2 · Accepted Answer · 2019-02-21

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

Can you post your code? Are these pvalue or padj?

ADD COMMENT • link 5.2 years ago Michael Love 41k

0

Entering edit mode

Last question first--these are the pvalue variable that deseq returns but suffixed by the covariate name.

I have most of my code wrapped up in functions for DRY and generality. Let me see if I can put together a linear sequence.

ADD REPLY • link 5.2 years ago ariel ▴ 20

0

Entering edit mode

I'm surprised that you would get NA p-value when adding the continuous covariate, the reverse is expected.

But anyway, you can turn off the p-value filtering (which is not a perfect method, I'll be the first to admit), with cooksCutoff=FALSE and then I would recommend to inspect certain genes with low p-values and large Cook's values by eye (see mcols(dds)$maxCooks).

ADD REPLY • link 5.2 years ago Michael Love 41k

0

Entering edit mode

I put in a code example. I hope that is helpful.

I don't want to get spurious results for my p-values, especially given that the NAs are for the taxa we are most interested in!

Let me know if the code adds any clarity.

ADD REPLY • link 5.2 years ago ariel ▴ 20

1

Entering edit mode

Oh yes, now I remember. The outliers are flagged for both cases, but only in the discrete covariate case are the outliers replaced, providing then a new p-value for those features. This is why you get the NA's only for the continuous case.

I'd recommend to turn off the filtering, and then to look by eye at the high maxCooks and low p-values cases with plotCounts(dds).

ADD REPLY • link 5.2 years ago Michael Love 41k

0

Entering edit mode

Great! I'll try that. Could I have a quick word of advice on how to interpret my "by eye" examination?

ADD REPLY • link 5.2 years ago ariel ▴ 20

0

Entering edit mode

Is a single datapoint, especially an outlier in the marginal distribution of age, driving the association across counts.

ADD REPLY • link 5.2 years ago Michael Love 41k

0

Entering edit mode

You may have to plot counts manually, I don’t know if we support plotCounts with continuous variables

ADD REPLY • link 5.2 years ago Michael Love 41k

0

Entering edit mode

You may have to plot counts manually, I don’t know if we support plotCounts with continuous variables

ADD REPLY • link 5.2 years ago Michael Love 41k

0

Entering edit mode

So basically, look for features with small p-values and check to see if they look like outliers?

ADD REPLY • link 5.2 years ago ariel ▴ 20

0

Entering edit mode

And the maxCooks value, see above and vignette.

ADD REPLY • link 5.2 years ago Michael Love 41k

0

Entering edit mode

Right-O. Sorry I missed that.

ADD REPLY • link 5.2 years ago ariel ▴ 20