Question: Limma validity for only hundreds of genes/metabolites
gravatar for iglezer
7 months ago by
iglezer0 wrote:

Dear all,

thanks a lot for supporting such nice packages. I would like to know if limma fit function could be used with smaller set of genes or other metabolites quantified by liquid chromatography. I use limma in genes lists with thousand of genes, and never used with smaller features. I wonder if the moderated t-test could be used with only hundreds of features measured in small sample sets.

For instance, 2  or more groups; 4 biological replicates each, 150 genes/metabolites. Can we use empirical Bayes moderation in this situation? If yes that would be great, since limma provides an excelent tool to overcome unequal variances and normality deviation in cases like that.


ADD COMMENTlink modified 7 months ago by Aaron Lun18k • written 7 months ago by iglezer0
gravatar for Ryan C. Thompson
7 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.3k wrote:

I think I recall an instance where Gordon said that the empirical Bayes squeezing employed by limma could theoretically work with as few as 4 genes. I believe a few hundred should be fine.

ADD COMMENTlink written 7 months ago by Ryan C. Thompson6.3k

limma actually works on any number of genes at all. With just one or two genes, it will do linear modelling without empirical Bayes (EB) moderation.

In my lab, we use limma routinely on PCR data with as few as half a dozen genes. limma is careful to never use more df than one would get by pooling the genewise variances, and this prevents EB from overstating what can be learned from the gene ensemble.

ADD REPLYlink modified 7 months ago • written 7 months ago by Gordon Smyth32k

Thanks! As far I could check all makes sense using limma with these small sets.

ADD REPLYlink written 7 months ago by iglezer0
gravatar for Aaron Lun
7 months ago by
Aaron Lun18k
Cambridge, United Kingdom
Aaron Lun18k wrote:

To add to Ryan's answer; some testing suggests that, with 150 features, limma does okay:

design <- model.matrix(~c(1,1,1,1,0,0,0,0))
ngenes <- 150
p.out <- scale.out <- df.out <- list()
for (i in 1:1000) {
    s2 <- 10/rchisq(ngenes, 10)
    y <- matrix(rnorm(ngenes*8, sd=sqrt(s2)), ncol=8)
    fit <- lmFit(y, design)
    fit <- eBayes(fit)
    p.out[[i]] <- fit$p.value[,2]

... which gives a uniform distribution of p-values, as expected under the null hypothesis. This result also holds if the shrinkage parameters (fit$s2.prior and fit$df.prior) are not precisely estimated, which was a pleasant surprise to me. I guess that the true values of the shrinkage parameters are not important, as long as the empirical variance distribution within each iteration is modelled well. Which makes sense, as the variances are just nuisance parameters when the aim is to detect differential expression.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Aaron Lun18k

Thanks Aaron.

ADD REPLYlink written 7 months ago by iglezer0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 210 users visited in the last hour