Search
Question: DESeq2 dispersion and p-value histograms
0
gravatar for rrcutler
21 days ago by
rrcutler30
rrcutler30 wrote:

After reading some post that recommend checking p-value distributions after DE testing, I went back and looked at some data where the result of DE testing was 0 significantly DE genes (3 replicates, 3 conditions, 3 times). Note that this was using an experiment where all samples were loaded into DESeq2 so that additional samples could be used for dispersion estimation. It turns out those p-value distributions were all hill shaped like so

 

This is not expected in a DE test the resulted in 0 genes, as the pvalues should be evenly distributed across all p-values in this type of result. Instead this seems to reflect an overestimated dispersion where many genes were assigned high pvalues because of their high dispersion values when their dispersion value should truly be lower or not gene-wise (group-wise). 

Thus, I attempted to correct this using the recommended fdrtool and got a better looking p-value distribution (I think). When looking at DE, I strikingly also now had the result of over 1000 significant DE genes for the same comparison that just had 0.

To further see how dispersion might be affecting this, I decided to do the same test, but only loading into DESeq2 the samples which I wanted to compare (not the whole experiment). I had done this in the past when I was not correcting for the p-value distribution as above and got similar results to loading all samples into DESeq2.

The p-value distribution and correction looked similar to the histograms above, however, the new DE result contained less than 200 genes as compared to 1000 genes when loading all samples! When looking at a PCA plot, it would not appear that these samples need to be run separately. 

My So my question is: Am I right by correcting p-values by examining p-value distributions? Why am I getting these huge differences when I do the correction, but am loading my samples into DESeq2 differently?

 

ADD COMMENTlink modified 18 days ago by Bernd Klaus470 • written 21 days ago by rrcutler30
2
gravatar for Bernd Klaus
18 days ago by
Bernd Klaus470
Germany
Bernd Klaus470 wrote:

Hi Ronald,

as Mike said, there is always some "detective work" involved on determining why a p-value distribution doesn't look like it "should". 

 

In my workflow, I validated the correction by looking at the overlap of the results of the original analysis with my corrected list, where I find 105 out of the 116 genes at an FDR of 0.1. So I can see that basically some power has been gained by reestimating the null model.

The main problem is that a hill shaped p-value histogram can also arise when there is a batch effect overlapping the experimental groups, se e.g. the following simulated example, where group 2 and sample 5:10 of group 1 are contained in a different batch):

library(tidyverse)
library(genefilter)

n <- 10000
m <- 20
x <- matrix(rnorm(n*m), nrow = n, ncol = m)
fac <- factor(c(rep(0, 10), rep(1, 10)))
rt1 <- rowttests(x, fac)

qplot(rt1$p.value, fill = I("tan3"))

x[, 6:15] <- x[, 6:15]+5
rt2 <- rowttests(x, fac)

qplot(rt2$p.value, fill = I("coral3"))

So I have the following suggestions (You can try any subset of them):

1.) Consider Pre-filtering your features as there are two peaks in the histogram near 0.8 that probably correspond to features with low counts that then always get a highly similar p-value

2.) Run DESeq2 on each time point separately and check the estimated dispersion prior variance via

attr(dispersionFunction(dds), "dispPriorVar")

Are these very different between the timepoints? If yes, this might be similar to the "different dispersion within the experimental groups " case described in the FAQ of the vignette.

However, note that this assumes that the actual dispersions trends are similar so you also compare two separate time points' dispersion values directly. They can be accessed via rowData(dds)$dispersion

And you could e.g. look at a scatterplot of log10(rowData(dds)$dispersion) in time 1 vs time 2

 

3.) If you let SVA estimate a surrogate variable and run DESeq2 adding it in your design, does it fix the problem?

(http://bioconductor.org/help/workflows/rnaseqGene/#removing-hidden-batch-effects)

4.) When running LRTs on hypothesis you are interested in (as suggested by Mike), what do the p-value histograms look like?

5.) Plot the raw data for some hits after using the p-value correction, do their time courses show sufficient change given the data variability? (the function plotCounts might be useful)

If the variance of the null distribution is too low there will be many significant genes with small effect sizes.

ADD COMMENTlink modified 11 days ago • written 18 days ago by Bernd Klaus470

Hi Bernd,

Thank you for the detailed reply, I have gone ahead and followed your recommendations.

2) Between time points, the estimated dispersion prior variance only differs by 0.02. However, when only loading samples relevant to a pairwise comparison of interest, there are instances where the estimated dispersion prior differs as much as 0.21. Since the biggest differences in dispersion prior variance appear to be variable when grouped by comparisons, this leads me to think that there is indeed very different dispersion within groups. 

4) When running LRTs, loading only samples of interest, I get similar results to when I use the original Wald test without pvalue distribution adjustment (many comparisons have 0 DE, pvalue distribution are right skewed). However, when these pvalue distributions are adjusted for, they result in U-shaped distributions, and 4-5 thousand differentially expressed genes (Wald test with pvalue distr adjustment results in at most 1100 genes). 

Why is there such behavior when using different test and adjusting the pvalue distribution? I am not interested in using the LRT to compare multiple levels at once or see which genes change over time, so I am only limited to use it when loading my samples of interest to compare and thus compare the full to reduced (intercept) model. 

Because of the difference in estimated dispersion prior when loading only samples of interest, I believe it would be best to stick with the Wald test and only loading samples of interest. 

ADD REPLYlink written 12 days ago by rrcutler30

Does the first histogram use prefiltering? e.g. genes with 3 or more samples with a normalized count >= 10. The spikes may be just from low counts. We had this in a DESeq2 supplementary figure.

ADD REPLYlink modified 11 days ago • written 11 days ago by Michael Love14k

You can also try to compare the dispersion estimates for the different time points directly as the prior comparison assumes that the trend fit is similar (I edited my answer accordingly). They can be accessed via rowData(dds)$dispersion. And you could e.g. look at a scatterplot of log10(rowData(dds)$dispersion) in time 1 vs time 2.

fdrtool is estimating the null model variance from the data, so it can also underestimate it, which commonly leads to the bathtub shaped histogram. It might be confused by the spikes, so try pre-filtering as suggested by Mike first.

ADD REPLYlink written 11 days ago by Bernd Klaus470
0
gravatar for Michael Love
21 days ago by
Michael Love14k
United States
Michael Love14k wrote:

What are the colors in the PCA plot?

Remember the PCA is just a 2 dimensional representation of a much higher dimensional space. You can't necessarily tell from a PCA plot if some subset of the genes has lower dispersion for your additional samples. Adding samples with lower dispersion for a subset of genes would lower the dispersion estimate across all samples, and could easily move some genes from one side of the significance threshold to the other (in addition, remember that the adjusted p-values are interdependent, so being among lower p-values "helps" as far as the adjustment for a given p-value). So I wouldn't be surprised that the number of DE genes changes after adding a third batch of samples. It's hard for me to say in a general way whether users should add "additional" samples if they are focused on a pair of conditions. It's too experiment-specific for me to make a general recommendation, so I just have the FAQ in the vignette as rough guidance on when it should not be done.

For me, it's too difficult to diagnose over the support forum when adjustment of p-values based on the histogram is potentially going around other issues that should be dealt with more directly (was it an LFC threshold test, in which case we do not expect a uniform distribution for the null p-values? are there unmodeled batch variables that would be better dealt with using a method like sva or RUV?), such that I don't include it as part of the recommended pipeline. Bernd can offer more guidance here perhaps.

ADD COMMENTlink modified 21 days ago • written 21 days ago by Michael Love14k

Hi Michael,

Thanks for the response. The colors in the PCA plot represent replicates of a condition. Each cluster is representative of a different time point. I have an equal interest in all the comparisons using all samples, so is loading all in fact appropriate here? Are there any test we can do to determine if it is or not?

From what I can gather, the reason I am getting very different results when either including or not including all samples for an experiment and then correcting the pvalues using the fdrtool is because dispersion values differ in each cluster (time point), so adding more samples from a different cluster(where global gene expression levels differ), is changing this estimate, which may not reflect the true dispersion. Given this, I am thinking it would be more appropriate to load only the sample from the same time point (although this does not allow to control for interactions interactions between condition and time).

The test used was the default Wald test, so I'm assuming that what is expected is a uniform distribution of p-values. There are no batch effects that I am aware of (I would imagine I could see that in the PCA), though I have made an attempt to detect any unknown effect. 

As always, I appreciate your help.

-R

ADD REPLYlink written 18 days ago by rrcutler30
1

re: "The test used was the default Wald test"

It sounds like you may be more interested in an LRT, given that you have different conditions and different time points. Maybe we can start from the beginning and I can help figure out an appropriate full and reduced design, and then you can see the distribution of p-values for the LRT, or Wald tests for coefficients within the design with interaction terms.

ADD REPLYlink written 18 days ago by Michael Love14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 116 users visited in the last hour