Question

Getting NAs in both p-values (normal and adjusted) in DEseq2

0

Entering edit mode

Juan Pablo • 0

@4fe652ce

Last seen 8 months ago

Sweden

Hello

I have an issue concerning my p values and you could probably help me understand. I am doing a gene expression analysis using deseq2. In total I have 5534 genes and out of them around 230 genes showed NA for both adjusted and non-adjusted. Maybe this is not that important since is a rather low proportion of the whole gene set but I would like to understand why. I read about the reasons why NA can be generated but when I check the data set those gene counts seem to be quite ok and not very extreme or different from the others.

For example this gene below is one of those that gives NA for both kinds of p-values. the 3 first numbers are the replicates from the first treatment and the second 3 numbers are the replicates from the second treatment:

PP_2663: 4106 30886 4353 1297 6438 7720 these are the not normalized counts

PP_2663: 3701.2 115446.2 3025.1 1942.6 3665 3689.9 these are the deseq-normalized counts

This is how a normal gene (no NA p values) looks like:

PP_4980: 8896 5882 9057 5371 11917 13615 not normalized

PP_4980: 8019 21985.8 6294.2 8044.6 6784.2 6507.5 normalized

This weirdo also does give normal p-values (no NA p values) for the adjusted and not-adjusted p-values:

PP_5640: 0 0 1 0 3 2 not normalized

PP_5640: 0 0 0.6 0 1.7 0.9 normalized

Soo what is going on here? am I doing something wrong? the pipeline and commands are quite straightforward. I just provide my count files matrix and DESeq it.

As I said maybe is not that important but it feels that these analysis are not correct. I do not think that filtering the low count genes would affect the results much as only three genes have a row sum lower than 10. The other genes have much higher counts (at least 300).

I wanna get to the bottom of this because I am failing to find differences in gene expression even between conditions that should give differences. The replicate number is low, I know, and there is variation between the replicates of the treatments which I suspect come from the library preparation (the proportion of coding-RNA vs non-coding like RNA is highly variable between replicates). Maybe that is not connected at all with my question above but I am just trying to connect the dots and give all the info of the peculiarities of this data set. It might help.

Thank you very much in advance!!

Regards

DESeq2 • 434 views

ADD COMMENT • link updated 8 months ago by ATpoint ★ 4.1k • written 8 months ago by Juan Pablo • 0

score 0 · Answer 1 · 2023-08-19

0

Entering edit mode

ATpoint ★ 4.1k

@atpoint-13662

Last seen 22 hours ago

Germany

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-are-some-p-values-set-to-na

ADD COMMENT • link 8 months ago ATpoint ★ 4.1k