Search
Question: t-statistic from various limma-approaches does not follow the theoretical distribution
0
19 days ago by
brynedal0
brynedal0 wrote:

Dear BioC,

I'm testing various approaches to deal with our RNA seq data that has variability in sample quality, single outlying counts etc. I love limma and all the methods that are implemented within this package. Yet, when trying different approaches I've noticed that the moderated t-statistic does not always follow the distribution that I expect. Here I am illustrating this using the Smchd1 experiment, that was used in the voomWithQualityWeights paper from 2015.

I am assuming that the degrees of freedom for the moderated t-statistic can be found in the MArrayLM object, in the df.total slot.

I'm doing four analyses:

1. using voom
2. using voom and robust regression
3. using voomWithQUalityWeights
4. using voomWithQUalityWeights and blocking

For each analysis the moderated t-statistics does not follow the theoretical distribution, especially when using weights from robust regression or voomWithQualityWeights. Consequently the number of significant genes increases. Is the body of the moderated t-statistic not suppose to follow the theoretical t-distribution? To me it looks as if the theoretical t-distribution is not the correct null-distribution. I've seen very similar results using my own data, where there are a few hundred samples. There using only voom gives very few significant genes, but using voomWithQualityWeights or lmFit(method='robust') given many many more.

#Loading and setting up data.
library(edgeR)
library(limma)
con <- url("http://bioinf.wehi.edu.au/voomWithQualityWeights/smchd1/x.rda")
fullanno = x$genes sel = rowSums(cpm(x$counts)>0.5)>=3
x = x[sel,]
selanno = x$genes x$genes = x$genes[,c(1,3)] des = model.matrix(~x$samples$group) colnames(des)[2] = "Smchd1nullvsWt" x = calcNormFactors(x, method="TMM") genotype = x$samples$group  # Analysis with voom only on full data set v = voom(x, design=des) vfit = lmFit(v) vfit = eBayes(vfit) options(digits=3) top = topTable(vfit,coef=2,number=Inf,sort.by="P") sum(top$adj.P.Val<0.05)
#This gives 12 significant genes.

# Analysis with voom on full data set, using lmFit(method='robust')
#v = voom(x, design=des)
vfit2 = lmFit(v, method='robust',maxit=250)
vfit2 = eBayes(vfit2)
options(digits=3)
top = topTable(vfit2,coef=2,number=Inf,sort.by="P")
sum(top$adj.P.Val<0.05) #This gives 701 significant genes. # Analysis with combined voom and sample quality weights vwts = voomWithQualityWeights(x, design=des, normalization="none", plot=TRUE) vfit3 = lmFit(vwts) vfit3 = eBayes(vfit3) top3 = topTable(vfit3,coef=2,number=Inf,sort.by="P") sum(top3$adj.P.Val<0.05)
# this gives 1478 significant genes.
# Analysis with voomWithQualityWeights and block weights
Z = cbind(c(1,0,0,0,0,0,-1),c(0,1,1,1,1,1,-5))
vwts2 = voomWithQualityWeights(x, design=des, normalization="none", var.design=Z, plot=TRUE)
vfit4 = lmFit(vwts2)
vfit4 = eBayes(vfit4)
top4 <- topTable(vfit4,coef=2,number=Inf,sort.by="P")
sum(top4$adj.P.Val<0.05) # this gives 488 significant genes. # plotting moderated t-statistics against theoretical t-statistics: par(mfrow=c(2,2)) qqt(vfit$t[,"Smchd1nullvsWt"],df=vfit$df.total,main='voom') abline(0,1) qqt(vfit2$t[,"Smchd1nullvsWt"],df=vfit2$df.total,main='voom and lmFit(method="robust")') abline(0,1) qqt(vfit3$t[,"Smchd1nullvsWt"],df=vfit3$df.total,main='voomWithQualityWeights') abline(0,1) qqt(vfit4$t[,"Smchd1nullvsWt"],df=vfit4$df.total,main='voomWithQualityWeights and blocking') abline(0,1) sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.5 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib locale: [1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] tmod_0.36 edgeR_3.22.5 limma_3.36.3 ADD COMMENTlink modified 11 days ago • written 19 days ago by brynedal0 2 18 days ago by Aaron Lun21k Cambridge, United Kingdom Aaron Lun21k wrote: The observed t-statistics would only follow the t-distribution if the null hypothesis was true for all genes. Clearly this is not the case in this data set. Let's make up a data set with similar characteristics but no DEGs: set.seed(0) x <- estimateDisp(x, des) ab <- 2 ^ x$AveLogCPM # technically a bit too high, but whatever.
sim.y <- matrix(rnbinom(nrow(des) * length(ab), mu=ab, size=1/x$tagwise.dispersion), ncol=nrow(des)) sim.v <- voom(sim.y, des, plot=TRUE) sim.fit <- lmFit(sim.v) sim.fit <- eBayes(sim.fit) sim.top <- topTable(sim.fit, coef=2, n=Inf) qqt(sim.top$t, median(sim.fit\$df.total))
abline(0, 1, col="red")

Looks pretty good to me.

In other words, if you have lots of DE genes, then your observed t-statistics will not follow the t-distribution. This really depends on your biological system - and on the power of your experiments. The null hypothesis of "no DE" is a convenient fantasy that is generally not true upon perturbation of real biological systems. Everything is linked to everything else such that you'll get small changes to all genes when you knock down a gene or change the environment or whatever. Even so, testing against this null is still useful because we usually don't have enough power to detect such small changes, so we don't get overwhelmed with rejections.

However, when you ramp up the power, you will get more and more power to detect DE. This manifests as greater deviation from the expected t-distribution under the null. As power increases to infinity, we will eventually detect every gene as being significantly DE upon perturbation - which is probably correct, but also useless for prioritizing genes for further study. If you have such an overpowered experiment, a more appropriate approach would be to test against a log-fold change threshold, see ?treat for further details.

Hi Aaron,

Of course only genes that are not affected should follow the theoretical null-distribution, that is why we use the theoretical as our null. I agree that if we compare cells in complete distress with healthy cells then almost all genes should be affected. The Smchd1 experiment is a perturbation experiment where a fair number of genes might be expected to be affected (I can not judge that). But the t-statistics I show above do not follow the theoretical line even slightly, but seem to follow another distribution. The resulting p-value distribution does not have a distinct enrichment at mall p-values, but a downward slope from 0 to 1. For me these are indicators of statistical modelling gone wrong. The Smchd1 experiment has a sample size of 7, I would not call it over-powered. Hence, I would expect the statistics to follow the theoretical null-distributions for the majority of genes. That is not the case here.  Do you think these distributions are resonable?

In a real world setting we are happy to see a few genes that deviate from the theoretical distribution. The data we are analyzing for example are not from a perturbation experiment, but samples from patients taken before they start a treatment. Hence, there will be a lot of variance, almost all of which will be related to things we are not the slightest bit interested in. We are looking for biomarkers of future response. Using ordinary voom we get a t-statistic that follows the theoretical one (and a uniform p-value distribution). But when using robust regression or voomWithQualityWeights we get a t-statistics that simply cross the theoretical distribution at zero, just like in the plot above (and consequently p-value distributions that have a downward slope from 0 to 1). This is not resonable.

Oh, ye of little faith. Perhaps my example would have been more demonstrative with some DE genes:

# Mostly non-DE genes with t-distributed t-statistics.
# 10% DE with extreme t-statistics in each direction.
set.seed(0)
all.t <- rt(10000, df=8)
all.t[1:1000] <- -5
all.t[1001:2000] <- 5

library(limma)
qqt(all.t, df=8)
abline(0, 1, col="red")

This yields strong deviations from the diagonal, even for those observations where the t-statistic was truly sampled from a t-distribution. This is an inevitable consequence of having a decently large proportion of DE genes with extreme t-statistics in your QQ plot. The theoretical quantiles from the non-DE genes are effectively compressed into a smaller range as the DE genes occupy the edges of the theoretical distribution. This manifests as a deviation from the line on the QQ plot for all data points - which is entirely correct, as all.t is not t-distributed!

As for your second comment: the differences in results with and without quality weights reflects the differences in power when the limma analysis (correctly) ignores low-quality samples. This results in lower p-values for the DE genes and the effect described above in the QQ plot. You seem to be concerned that using quality weights results in so many more DE genes, but I would say that this reflects the heterogeneity of data quality across your samples rather than any problem with voom or limma. If your samples were consistently high quality, the inclusion of quality weights would have no major effect.

I'm not particularly concerned about the method="robust" analysis, that's not something we recommend anyway. And it's unclear to me what you were intending with the var.design=Z call.

In your simulation we see a nice deviance from the diagonal for the significant genes, just as I would expect.

What I instead observe in our data (and to some extent also in the two lower plots in the original post) is that there really is no deviance from the straight line of t-statistics. Below is our data for example. 36 genes are deemed significant when using voomWithQualityWeights (marked i red in the qq-plot).

Here all t-statistics fall on the same straight line. No deviance from this line for any genes.

Actually, It kind of looks like all t-statistics are multiplied by some constant. In this case, dividing all t-stats with 1.225 gives something that seem to follow the theoretical distribution quite well:

This behavior of the t-statistic is very unexpected for me. I expect that the majority of genes should follow the theoretical null-distribution, just as they do in your recent simulation. I expect that significant genes would deviate from this theoretical distribution and show more extreme values.

(Regarding the var.design=Z call, I simply followed the code in the voomWithQualityWeights paper from 2015.)

Not unexpected, and not a problem. The exact shape of the QQ plot will depend on the distribution of effect sizes and the power of your experimental design. For example, here's a more "straight-line" QQ plot:

set.seed(0)
all.t <- rt(10000, df=8)
all.t[1:3000] <- 2*all.t[1:3000] # DE genes.

library(limma)
qqt(all.t, df=8)
abline(0, 1, col="red")

Some tinkering with proportion and distribution of t-statistics for DE genes will give you something closer to what you're observing in your data set - I'm not going to bother to do that though.