Dear BioC,

I'm testing various approaches to deal with our RNA seq data that has variability in sample quality, single outlying counts etc. I love limma and all the methods that are implemented within this package. Yet, when trying different approaches I've noticed that the moderated t-statistic does not always follow the distribution that I expect. Here I am illustrating this using the Smchd1 experiment, that was used in the voomWithQualityWeights paper from 2015.

I am assuming that the degrees of freedom for the moderated t-statistic can be found in the MArrayLM object, in the df.total slot.

I'm doing four analyses:

- using voom
- using voom and robust regression
- using voomWithQUalityWeights
- using voomWithQUalityWeights and blocking

For each analysis the moderated t-statistics does not follow the theoretical distribution, especially when using weights from robust regression or voomWithQualityWeights. Consequently the number of significant genes increases. Is the body of the moderated t-statistic not suppose to follow the theoretical t-distribution? To me it looks as if the theoretical t-distribution is not the correct null-distribution. I've seen very similar results using my own data, where there are a few hundred samples. There using only voom gives very few significant genes, but using voomWithQualityWeights or lmFit(method='robust') given many many more.

`#Loading and setting up data.`

`library(edgeR) library(limma) con <- url("http://bioinf.wehi.edu.au/voomWithQualityWeights/smchd1/x.rda") load(con) fullanno = x$genes sel = rowSums(cpm(x$counts)>0.5)>=3 x = x[sel,] selanno = x$genes x$genes = x$genes[,c(1,3)] des = model.matrix(~x$samples$group) colnames(des)[2] = "Smchd1nullvsWt" x = calcNormFactors(x, method="TMM") genotype = x$samples$group`

`# Analysis with voom only on full data set`

`v = voom(x, design=des) vfit = lmFit(v) vfit = eBayes(vfit) options(digits=3) top = topTable(vfit,coef=2,number=Inf,sort.by="P") sum(top$adj.P.Val<0.05)`

`#This gives 12 significant genes.`

`# Analysis with voom on full data set, using lmFit(method='robust') #v = voom(x, design=des) vfit2 = lmFit(v, method='robust',maxit=250) vfit2 = eBayes(vfit2) options(digits=3) top = topTable(vfit2,coef=2,number=Inf,sort.by="P") sum(top$adj.P.Val<0.05)`

`#This gives 701 significant genes.`

`# Analysis with combined voom and sample quality weights`

`vwts = voomWithQualityWeights(x, design=des, normalization="none", plot=TRUE) vfit3 = lmFit(vwts) vfit3 = eBayes(vfit3) top3 = topTable(vfit3,coef=2,number=Inf,sort.by="P") sum(top3$adj.P.Val<0.05)`

# this gives 1478 significant genes.

`# Analysis with`

voomWithQualityWeights`and block weights`

`Z = cbind(c(1,0,0,0,0,0,-1),c(0,1,1,1,1,1,-5)) vwts2 = voomWithQualityWeights(x, design=des, normalization="none", var.design=Z, plot=TRUE) vfit4 = lmFit(vwts2) vfit4 = eBayes(vfit4) top4 <- topTable(vfit4,coef=2,number=Inf,sort.by="P") sum(top4$adj.P.Val<0.05)`

# this gives 488 significant genes.`# plotting moderated t-statistics against theoretical t-statistics:`

`par(mfrow=c(2,2)) qqt(vfit$t[,"Smchd1nullvsWt"],df=vfit$df.total,main='voom') abline(0,1) qqt(vfit2$t[,"Smchd1nullvsWt"],df=vfit2$df.total,main='voom and lmFit(method="robust")') abline(0,1) qqt(vfit3$t[,"Smchd1nullvsWt"],df=vfit3$df.total,main='voomWithQualityWeights') abline(0,1) qqt(vfit4$t[,"Smchd1nullvsWt"],df=vfit4$df.total,main='voomWithQualityWeights and blocking') abline(0,1)`

sessionInfo()

R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.5

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tmod_0.36 edgeR_3.22.5 limma_3.36.3

Hi Aaron,

Of course only genes that are not affected should follow the theoretical null-distribution, that is why we use the theoretical as our null. I agree that if we compare cells in complete distress with healthy cells then almost all genes should be affected. The Smchd1 experiment is a perturbation experiment where a fair number of genes might be expected to be affected (I can not judge that). But the t-statistics I show above do not follow the theoretical line even slightly, but seem to follow another distribution. The resulting p-value distribution does not have a distinct enrichment at mall p-values, but a downward slope from 0 to 1. For me these are indicators of statistical modelling gone wrong. The Smchd1 experiment has a sample size of 7, I would not call it over-powered. Hence, I would expect the statistics to follow the theoretical null-distributions for the majority of genes. That is not the case here. Do you think these distributions are resonable?

In a real world setting we are happy to see a few genes that deviate from the theoretical distribution. The data we are analyzing for example are not from a perturbation experiment, but samples from patients taken before they start a treatment. Hence, there will be a lot of variance, almost all of which will be related to things we are not the slightest bit interested in. We are looking for biomarkers of future response. Using ordinary voom we get a t-statistic that follows the theoretical one (and a uniform p-value distribution). But when using robust regression or voomWithQualityWeights we get a t-statistics that simply cross the theoretical distribution at zero, just like in the plot above (and consequently p-value distributions that have a downward slope from 0 to 1). This is not resonable.

Oh, ye of little faith. Perhaps my example would have been more demonstrative with some DE genes:

This yields strong deviations from the diagonal, even for those observations where the t-statistic was truly sampled from a t-distribution. This is an inevitable consequence of having a decently large proportion of DE genes with extreme t-statistics in your QQ plot. The theoretical quantiles from the non-DE genes are effectively compressed into a smaller range as the DE genes occupy the edges of the theoretical distribution. This manifests as a deviation from the line on the QQ plot for all data points - which is entirely correct, as

`all.t`

is not t-distributed!As for your second comment: the differences in results with and without quality weights reflects the differences in power when the

limmaanalysis (correctly) ignores low-quality samples. This results in lower p-values for the DE genes and the effect described above in the QQ plot. You seem to be concerned that using quality weights results in so many more DE genes, but I would say that this reflects the heterogeneity of data quality across your samples rather than any problem with`voom`

orlimma. If your samples were consistently high quality, the inclusion of quality weights would have no major effect.I'm not particularly concerned about the

`method="robust"`

analysis, that's not something we recommend anyway. And it's unclear to me what you were intending with the`var.design=Z`

call.In your simulation we see a nice deviance from the diagonal for the significant genes, just as I would expect.

What I instead observe in our data (and to some extent also in the two lower plots in the original post) is that there really is no deviance from the straight line of t-statistics. Below is our data for example. 36 genes are deemed significant when using voomWithQualityWeights (marked i red in the qq-plot).

Here all t-statistics fall on the same straight line. No deviance from this line for any genes.

Actually, It kind of looks like all t-statistics are multiplied by some constant. In this case, dividing all t-stats with 1.225 gives something that seem to follow the theoretical distribution quite well:

This behavior of the t-statistic is very unexpected for me. I expect that the majority of genes should follow the theoretical null-distribution, just as they do in your recent simulation. I expect that significant genes would deviate from this theoretical distribution and show more extreme values.

(Regarding the var.design=Z call, I simply followed the code in the voomWithQualityWeights paper from 2015.)

Not unexpected, and not a problem. The exact shape of the QQ plot will depend on the distribution of effect sizes and the power of your experimental design. For example, here's a more "straight-line" QQ plot:

Some tinkering with proportion and distribution of t-statistics for DE genes will give you something closer to what you're observing in your data set - I'm not going to bother to do that though.