Question: Limma validity for only hundreds of genes/metabolites
0
2.3 years ago by
iglezer0
iglezer0 wrote:

Dear all,

thanks a lot for supporting such nice packages. I would like to know if limma fit function could be used with smaller set of genes or other metabolites quantified by liquid chromatography. I use limma in genes lists with thousand of genes, and never used with smaller features. I wonder if the moderated t-test could be used with only hundreds of features measured in small sample sets.

For instance, 2  or more groups; 4 biological replicates each, 150 genes/metabolites. Can we use empirical Bayes moderation in this situation? If yes that would be great, since limma provides an excelent tool to overcome unequal variances and normality deviation in cases like that.

Tks.

modified 2.3 years ago by Aaron Lun25k • written 2.3 years ago by iglezer0
Answer: Limma validity for only hundreds of genes/metabolites
2
2.3 years ago by
Scripps Research, La Jolla, CA
Ryan C. Thompson7.4k wrote:

I think I recall an instance where Gordon said that the empirical Bayes squeezing employed by limma could theoretically work with as few as 4 genes. I believe a few hundred should be fine.

limma actually works on any number of genes at all. With just one or two genes, it will do linear modelling without empirical Bayes (EB) moderation.

In my lab, we use limma routinely on PCR data with as few as half a dozen genes. limma is careful to never use more df than one would get by pooling the genewise variances, and this prevents EB from overstating what can be learned from the gene ensemble.

Thanks! As far I could check all makes sense using limma with these small sets.

Answer: Limma validity for only hundreds of genes/metabolites
1
2.3 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

To add to Ryan's answer; some testing suggests that, with 150 features, limma does okay:

design <- model.matrix(~c(1,1,1,1,0,0,0,0))
ngenes <- 150
p.out <- scale.out <- df.out <- list()
for (i in 1:1000) {
s2 <- 10/rchisq(ngenes, 10)
y <- matrix(rnorm(ngenes*8, sd=sqrt(s2)), ncol=8)
fit <- lmFit(y, design)
fit <- eBayes(fit)
p.out[[i]] <- fit$p.value[,2] } hist(unlist(p.out))  ... which gives a uniform distribution of p-values, as expected under the null hypothesis. This result also holds if the shrinkage parameters (fit$s2.prior and fit\$df.prior) are not precisely estimated, which was a pleasant surprise to me. I guess that the true values of the shrinkage parameters are not important, as long as the empirical variance distribution within each iteration is modelled well. Which makes sense, as the variances are just nuisance parameters when the aim is to detect differential expression.