Question: Dealing with genes that have Padj=NA
3.4 years ago
raya.fai20
raya.fai20 wrote:

Hello,

I have a question about  the genes that get Padj = NA.

In my experiment I compare 3 control samples and 3 treated samples. Out of 4500 genes, about 1800 get Padj = NA. I wish to understand how to treat these genes: as not changed genes or to exclude them from my analysis. Since I want to do a Fisher test on the data it is important for me to know for each gene if it changed, did not change or undetermined.

As I understand from the vignette this happens because of the automatic independent filtering. I read in section 3.8 that this is an optimization of the FDR correction (optimizing the number of genes which will have an adjusted p value below a given FDR cutoff, alpha).

I also read that it is possible to remove the independent filtering by writing independentFiltering=FALSE in the results function.

My question is how to treat these Padj=NA genes and what do I lose if I run DEseq2 without the independent filtering?

Thank you very much,

Raya Romm, PhD student

The Hebrew University of Jerusalem

3.4 years ago
Michael Love23k
United States
Michael Love23k wrote:

The genes with adjusted p-value of NA have less mean normalized counts than the optimal threshold. You can make the same plot as in the vignette to see how the power increases when the threshold increases.

There's no right answer of exactly what filter to use, it's a sliding scale of "counts high enough to have good power to detect differential expression". One choice is to optimize the power, as detailed in the independent filtering reference by Bourgon, which is what you get by default.

If you want to include more genes (so have less NA adjusted p-values) you can pick a lower threshold using this plot, and then:

results(dds, independentFiltering=FALSE)

res$pvalue[res$baseMean < x] <- NA

res$padj <- p.adjust(res$pvalue, method="BH")
9 months ago
Fuqi Xu10
Fuqi Xu10 wrote:

I came across the same problem when analyzing my data. This is how I dealt with it. If P value = NA, there is an extreme count, which is defined by Cook's distance. So the simplest way is to delete the abnormal observations until it returns a valid p-value.

Also, we need to make sure those observations are deletable. This observation doesn't have special biological meaning and deleting those observations doesn't change much of the p-value of other genes.

can you explain with more detail what should i do?

3.4 years ago
raya.fai20
raya.fai20 wrote:

Hi Michael,