Dealing with genes that have Padj=NA
Entering edit mode
raya.fai ▴ 60
Last seen 15 months ago


I have a question about  the genes that get Padj = NA.

In my experiment I compare 3 control samples and 3 treated samples. Out of 4500 genes, about 1800 get Padj = NA. I wish to understand how to treat these genes: as not changed genes or to exclude them from my analysis. Since I want to do a Fisher test on the data it is important for me to know for each gene if it changed, did not change or undetermined.

As I understand from the vignette this happens because of the automatic independent filtering. I read in section 3.8 that this is an optimization of the FDR correction (optimizing the number of genes which will have an adjusted p value below a given FDR cutoff, alpha).

I also read that it is possible to remove the independent filtering by writing independentFiltering=FALSE in the results function.

My question is how to treat these Padj=NA genes and what do I lose if I run DEseq2 without the independent filtering?

Thank you very much,

Raya Romm, PhD student

The Hebrew University of Jerusalem


deseq2 • 15k views
Entering edit mode
Last seen 18 hours ago
United States

The genes with adjusted p-value of NA have less mean normalized counts than the optimal threshold. You can make the same plot as in the vignette to see how the power increases when the threshold increases.

There's no right answer of exactly what filter to use, it's a sliding scale of "counts high enough to have good power to detect differential expression". One choice is to optimize the power, as detailed in the independent filtering reference by Bourgon, which is what you get by default.

If you want to include more genes (so have less NA adjusted p-values) you can pick a lower threshold using this plot, and then:

results(dds, independentFiltering=FALSE)

res$pvalue[res$baseMean < x] <- NA

res$padj <- p.adjust(res$pvalue, method="BH")
Entering edit mode
Fuqi Xu ▴ 10
Last seen 3.8 years ago

I came across the same problem when analyzing my data. This is how I dealt with it. If P value = NA, there is an extreme count, which is defined by Cook's distance. So the simplest way is to delete the abnormal observations until it returns a valid p-value.

Also, we need to make sure those observations are deletable. This observation doesn't have special biological meaning and deleting those observations doesn't change much of the p-value of other genes.



Entering edit mode

can you explain with more detail what should i do?

Entering edit mode
raya.fai ▴ 60
Last seen 15 months ago

Hi Michael,

Thank you for your answer.



Login before adding your answer.

Traffic: 756 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6