Question

DESeq2 analysis : lots of NA results when adding covariables

0

Entering edit mode

andree-anne • 0

@andree-anne-20906

Last seen 18 months ago

Canada

Hello,

My team and I have a question regarding the addition of covariables in DESeq2 analysis.

We ran DESeq2 with a first model and obtained adjusted p-values for all genes. However, when we tried to add 2 more variables in that model (such as age and BMI and using exactly the same input files and script), we obtained NA for the adjusted p-value of almost all genes. Reading the vignette, we saw that this might be due to the independent filtering although some of those genes have very high mean normalized counts and its the exact same means as in the first model when we obtained adjusted p-values for all genes. We are wondering if there is anything that can explain these discrepancies between our 2 similar models and if there is something we can do in order to obtain those p-values.

deseq2 • 389 views

ADD COMMENT • link updated 3.8 years ago by Michael Love 42k • written 3.8 years ago by andree-anne • 0

score 0 · Answer 1 · 2020-07-21

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 2 days ago

United States

Can you post the designs and the results() calls?

In general posting code is an important step in getting accurate replies from developers.

ADD COMMENT • link 3.8 years ago Michael Love 42k

0

Entering edit mode

Hi

Here is our design and results summary for our first model :

dds<-DESeqDataSetFromMatrix(countData = count_table, colData = Sample_data, design = ~ Run + Region + Age_gestation_v1+ HGOP_T120) 
dds <- DESeq(dds)
res <- results(dds)
summary(res)
out of 2159 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 0, 0%
LFC < 0 (down)     : 1, 0.046%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

In the summary(res), we can see that out of 2159 with nonzero total count, there is no low count (mean count < 0)

In our second model, with only 2 more covariates (IMC_v1 et Age):

dds<-DESeqDataSetFromMatrix(countData = count_table, colData = Sample_data, design = ~ Run + Region + Age_gestation_v1 + IMC_v1 + Age + HGOP_T120)
dds <- DESeq(dds)
res <- results(dds)
summary(res)
out of 2159 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 14, 0.65%
LFC < 0 (down)     : 4, 0.19%
outliers [1]       : 0, 0%
low counts [2]     : 2050, 95%
(mean count < 890)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Here, we have 2159 with nonzero total count, but there is 2050 considered with low count (mean count < 890)

We are wondering what may cause the mean count to be < 890 instead of 0.

Thank you very much

ADD REPLY • link 3.8 years ago andree-anne • 0

0

Entering edit mode

Ah, i see. so in the second analysis, the independent filtering was a bit more greedy, and found that by excluding genes with counts up to 890, it was able to achieve these 18 genes with FDR < 0.1. There's nothing wrong in particular with this second run. But a more sophisticated approach to the independent filtering that is default in results() is to use IHW (see vignette for example code).

ADD REPLY • link 3.8 years ago Michael Love 42k