Highly left- skewed raw pvalues from DESeq2 analysis
1
0
Entering edit mode
tornadow18 • 0
@2bffac57
Last seen 2.7 years ago
United States

Hello, I have a question regarding to outputs from a DESeq2 analysis that I wasn't sure if it is normal. Briefly, I have an RNA-seq experiment analyzed by Salmon, imported into R by tximeta, and analyzed by DESeq2. After prefiltering, my DESeqDataset had 76015 transcripts.

keep <- rowSums(counts(dds)) >= 10 # genes with at least 10 reads across all samples
dds <- dds[keep, ]
dim(dds)
#[1] 76015   20

After running DESeq2, the raw pvalue has a distribution looks like below enter image description here

And the adjusted p-value looks like below (I have to log transform the y-axis in order to show that I do have some padj near 0)

enter image description here

I wonder if this looks possible or indicates any serious issues. BTW there are about ~14000 NA in padj, which should not be the sole reason for this raw pvalue distribution.

DESeq2 • 596 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

Your first line says rowSums(counts(dds)) >= 10 which is that the sum count from all samples is 10 or more, not what you say.

You want rowSums(counts(dds) >= 10) >= X with X being the number of samples to have 10 or more.

Are you converting the NA in pvalue to 1?

ADD COMMENT
0
Entering edit mode

Thanks! Yes, I guess I wasn't very clear with what I said. Yes, I did mean that the sum count from all samples is 10 or more. I could use more stringent criteria, but I thought this is not important so I just used what was written in the tutorial.

I did convert the NA in padj to 1 but not the NA in pvalue. And I just realized I have 1541 out of 76015 in pvalue to be NA, and I also have 14353 out of 76015 in padj to be NA.

ADD REPLY
0
Entering edit mode

It's hard for me to guess what's going on. I haven't seen a p-value histogram like that, and yes it indicates something is wrong in the pipeline. I would guess that something about the count matrix is very different than typical RNA-seq. I would do some more QC plots like PCA, and boxplots, scatterplots etc of the log counts.

ADD REPLY

Login before adding your answer.

Traffic: 791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6