Question

Highly left- skewed raw pvalues from DESeq2 analysis

0

Entering edit mode

tornadow18 • 0

@2bffac57

Last seen 2.7 years ago

United States

Hello, I have a question regarding to outputs from a DESeq2 analysis that I wasn't sure if it is normal. Briefly, I have an RNA-seq experiment analyzed by Salmon, imported into R by tximeta, and analyzed by DESeq2. After prefiltering, my DESeqDataset had 76015 transcripts.

keep <- rowSums(counts(dds)) >= 10 # genes with at least 10 reads across all samples
dds <- dds[keep, ]
dim(dds)
#[1] 76015   20

After running DESeq2, the raw pvalue has a distribution looks like below enter image description here

And the adjusted p-value looks like below (I have to log transform the y-axis in order to show that I do have some padj near 0)

enter image description here

I wonder if this looks possible or indicates any serious issues. BTW there are about ~14000 NA in padj, which should not be the sole reason for this raw pvalue distribution.

DESeq2 • 596 views

ADD COMMENT • link updated 2.8 years ago by Michael Love 41k • written 2.8 years ago by tornadow18 • 0

score 0 · Answer 1 · 2021-06-25

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 9 hours ago

United States

Your first line says rowSums(counts(dds)) >= 10 which is that the sum count from all samples is 10 or more, not what you say.

You want rowSums(counts(dds) >= 10) >= X with X being the number of samples to have 10 or more.

Are you converting the NA in pvalue to 1?

ADD COMMENT • link 2.8 years ago Michael Love 41k

0

Entering edit mode

Thanks! Yes, I guess I wasn't very clear with what I said. Yes, I did mean that the sum count from all samples is 10 or more. I could use more stringent criteria, but I thought this is not important so I just used what was written in the tutorial.

I did convert the NA in padj to 1 but not the NA in pvalue. And I just realized I have 1541 out of 76015 in pvalue to be NA, and I also have 14353 out of 76015 in padj to be NA.

ADD REPLY • link 2.8 years ago tornadow18 • 0

0

Entering edit mode

It's hard for me to guess what's going on. I haven't seen a p-value histogram like that, and yes it indicates something is wrong in the pipeline. I would guess that something about the count matrix is very different than typical RNA-seq. I would do some more QC plots like PCA, and boxplots, scatterplots etc of the log counts.

ADD REPLY • link 2.8 years ago Michael Love 41k