Question

finding DEG in a dataset with big sample size

0

Entering edit mode

Shamim Sarhadi ▴ 20

@shamim-sarhadi-9395

Last seen 16 days ago

Germany

Hi

I want to know,what is the best way for finding DEG in a microarray dataset with 200 sample, I used limma for this but in bioconductor book mentioned that,when sample size are moderate or large,say ten or more in each group there is generally no advantage to using the Bayesian approach

now what is the best approach,tools,or package for my work?

limma • 861 views

ADD COMMENT • link updated 8.5 years ago by Aaron Lun ★ 28k • written 8.5 years ago by Shamim Sarhadi ▴ 20

score 4 · Accepted Answer · 2016-03-06

The real question is how many residual degrees of freedom you have in your model, rather than the number of samples. The residual d.f. determines how much information is available to estimate the variance - the more d.f., the more reliably you can estimate the variance of each gene. If you have lots of residual d.f., using empirical Bayes methods to share information between genes will not provide much benefit, because you can already estimate the variance fairly well using only the information for each gene by itself.

In your case, 200 samples is quite large, but if your model has 198 parameters, then you only have 2 residual d.f., in which case you would benefit from EB shrinkage. If your model has only 2 parameters, then you'd have 198 residual d.f., and the benefit of shrinkage would be lessened. However, it doesn't hurt to do EB shrinkage - there's just less benefit from doing so - so I would just continue using limma.