**0**wrote:

Dear BioC,

I'm testing various approaches to deal with our RNA seq data that has variability in sample quality, single outlying counts etc. I love limma and all the methods that are implemented within this package. Yet, when trying different approaches I've noticed that the moderated t-statistic does not always follow the distribution that I expect. Here I am illustrating this using the Smchd1 experiment, that was used in the voomWithQualityWeights paper from 2015.

I am assuming that the degrees of freedom for the moderated t-statistic can be found in the MArrayLM object, in the df.total slot.

I'm doing four analyses:

- using voom
- using voom and robust regression
- using voomWithQUalityWeights
- using voomWithQUalityWeights and blocking

For each analysis the moderated t-statistics does not follow the theoretical distribution, especially when using weights from robust regression or voomWithQualityWeights. Consequently the number of significant genes increases. Is the body of the moderated t-statistic not suppose to follow the theoretical t-distribution? To me it looks as if the theoretical t-distribution is not the correct null-distribution. I've seen very similar results using my own data, where there are a few hundred samples. There using only voom gives very few significant genes, but using voomWithQualityWeights or lmFit(method='robust') given many many more.

`#Loading and setting up data.`

`library(edgeR) library(limma) con <- url("http://bioinf.wehi.edu.au/voomWithQualityWeights/smchd1/x.rda") load(con) fullanno = x$genes sel = rowSums(cpm(x$counts)>0.5)>=3 x = x[sel,] selanno = x$genes x$genes = x$genes[,c(1,3)] des = model.matrix(~x$samples$group) colnames(des)[2] = "Smchd1nullvsWt" x = calcNormFactors(x, method="TMM") genotype = x$samples$group`

`# Analysis with voom only on full data set`

`v = voom(x, design=des) vfit = lmFit(v) vfit = eBayes(vfit) options(digits=3) top = topTable(vfit,coef=2,number=Inf,sort.by="P") sum(top$adj.P.Val<0.05)`

`#This gives 12 significant genes.`

`# Analysis with voom on full data set, using lmFit(method='robust') #v = voom(x, design=des) vfit2 = lmFit(v, method='robust',maxit=250) vfit2 = eBayes(vfit2) options(digits=3) top = topTable(vfit2,coef=2,number=Inf,sort.by="P") sum(top$adj.P.Val<0.05)`

`#This gives 701 significant genes.`

`# Analysis with combined voom and sample quality weights`

`vwts = voomWithQualityWeights(x, design=des, normalization="none", plot=TRUE) vfit3 = lmFit(vwts) vfit3 = eBayes(vfit3) top3 = topTable(vfit3,coef=2,number=Inf,sort.by="P") sum(top3$adj.P.Val<0.05)`

# this gives 1478 significant genes.

`# Analysis with`

voomWithQualityWeights`and block weights`

`Z = cbind(c(1,0,0,0,0,0,-1),c(0,1,1,1,1,1,-5)) vwts2 = voomWithQualityWeights(x, design=des, normalization="none", var.design=Z, plot=TRUE) vfit4 = lmFit(vwts2) vfit4 = eBayes(vfit4) top4 <- topTable(vfit4,coef=2,number=Inf,sort.by="P") sum(top4$adj.P.Val<0.05)`

# this gives 488 significant genes.`# plotting moderated t-statistics against theoretical t-statistics:`

`par(mfrow=c(2,2)) qqt(vfit$t[,"Smchd1nullvsWt"],df=vfit$df.total,main='voom') abline(0,1) qqt(vfit2$t[,"Smchd1nullvsWt"],df=vfit2$df.total,main='voom and lmFit(method="robust")') abline(0,1) qqt(vfit3$t[,"Smchd1nullvsWt"],df=vfit3$df.total,main='voomWithQualityWeights') abline(0,1) qqt(vfit4$t[,"Smchd1nullvsWt"],df=vfit4$df.total,main='voomWithQualityWeights and blocking') abline(0,1)`

sessionInfo()

R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.5

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tmod_0.36 edgeR_3.22.5 limma_3.36.3