DESeq2 analysis : lots of NA results when adding covariables
1
0
Entering edit mode
@andree-anne-20906
Last seen 18 months ago
Canada

Hello,

My team and I have a question regarding the addition of covariables in DESeq2 analysis.

We ran DESeq2 with a first model and obtained adjusted p-values for all genes. However, when we tried to add 2 more variables in that model (such as age and BMI and using exactly the same input files and script), we obtained NA for the adjusted p-value of almost all genes. Reading the vignette, we saw that this might be due to the independent filtering although some of those genes have very high mean normalized counts and its the exact same means as in the first model when we obtained adjusted p-values for all genes. We are wondering if there is anything that can explain these discrepancies between our 2 similar models and if there is something we can do in order to obtain those p-values.

deseq2 • 389 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

Can you post the designs and the results() calls?

In general posting code is an important step in getting accurate replies from developers.

ADD COMMENT
0
Entering edit mode

Hi

Here is our design and results summary for our first model :

dds<-DESeqDataSetFromMatrix(countData = count_table, colData = Sample_data, design = ~ Run + Region + Age_gestation_v1+ HGOP_T120) 
dds <- DESeq(dds)
res <- results(dds)
summary(res)
out of 2159 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 0, 0%
LFC < 0 (down)     : 1, 0.046%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

In the summary(res), we can see that out of 2159 with nonzero total count, there is no low count (mean count < 0)

In our second model, with only 2 more covariates (IMC_v1 et Age):

dds<-DESeqDataSetFromMatrix(countData = count_table, colData = Sample_data, design = ~ Run + Region + Age_gestation_v1 + IMC_v1 + Age + HGOP_T120)
dds <- DESeq(dds)
res <- results(dds)
summary(res)
out of 2159 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 14, 0.65%
LFC < 0 (down)     : 4, 0.19%
outliers [1]       : 0, 0%
low counts [2]     : 2050, 95%
(mean count < 890)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Here, we have 2159 with nonzero total count, but there is 2050 considered with low count (mean count < 890)

We are wondering what may cause the mean count to be < 890 instead of 0.

Thank you very much

ADD REPLY
0
Entering edit mode

Ah, i see. so in the second analysis, the independent filtering was a bit more greedy, and found that by excluding genes with counts up to 890, it was able to achieve these 18 genes with FDR < 0.1. There's nothing wrong in particular with this second run. But a more sophisticated approach to the independent filtering that is default in results() is to use IHW (see vignette for example code).

ADD REPLY

Login before adding your answer.

Traffic: 589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6