Search
Question: Limma validity for only hundreds of genes/metabolites
0
gravatar for iglezer
11 months ago by
iglezer0
iglezer0 wrote:

Dear all,

thanks a lot for supporting such nice packages. I would like to know if limma fit function could be used with smaller set of genes or other metabolites quantified by liquid chromatography. I use limma in genes lists with thousand of genes, and never used with smaller features. I wonder if the moderated t-test could be used with only hundreds of features measured in small sample sets.

For instance, 2  or more groups; 4 biological replicates each, 150 genes/metabolites. Can we use empirical Bayes moderation in this situation? If yes that would be great, since limma provides an excelent tool to overcome unequal variances and normality deviation in cases like that.

Tks.

ADD COMMENTlink modified 11 months ago by Aaron Lun19k • written 11 months ago by iglezer0
2
gravatar for Ryan C. Thompson
11 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.8k wrote:

I think I recall an instance where Gordon said that the empirical Bayes squeezing employed by limma could theoretically work with as few as 4 genes. I believe a few hundred should be fine.

ADD COMMENTlink written 11 months ago by Ryan C. Thompson6.8k

limma actually works on any number of genes at all. With just one or two genes, it will do linear modelling without empirical Bayes (EB) moderation.

In my lab, we use limma routinely on PCR data with as few as half a dozen genes. limma is careful to never use more df than one would get by pooling the genewise variances, and this prevents EB from overstating what can be learned from the gene ensemble.

ADD REPLYlink modified 11 months ago • written 11 months ago by Gordon Smyth33k

Thanks! As far I could check all makes sense using limma with these small sets.

ADD REPLYlink written 11 months ago by iglezer0
1
gravatar for Aaron Lun
11 months ago by
Aaron Lun19k
Cambridge, United Kingdom
Aaron Lun19k wrote:

To add to Ryan's answer; some testing suggests that, with 150 features, limma does okay:

design <- model.matrix(~c(1,1,1,1,0,0,0,0))
ngenes <- 150
p.out <- scale.out <- df.out <- list()
for (i in 1:1000) {
    s2 <- 10/rchisq(ngenes, 10)
    y <- matrix(rnorm(ngenes*8, sd=sqrt(s2)), ncol=8)
    fit <- lmFit(y, design)
    fit <- eBayes(fit)
    p.out[[i]] <- fit$p.value[,2]
}
hist(unlist(p.out))

... which gives a uniform distribution of p-values, as expected under the null hypothesis. This result also holds if the shrinkage parameters (fit$s2.prior and fit$df.prior) are not precisely estimated, which was a pleasant surprise to me. I guess that the true values of the shrinkage parameters are not important, as long as the empirical variance distribution within each iteration is modelled well. Which makes sense, as the variances are just nuisance parameters when the aim is to detect differential expression.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Aaron Lun19k

Thanks Aaron.

ADD REPLYlink written 11 months ago by iglezer0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 140 users visited in the last hour