Search
Question: How to use limma to do regularization for Ballgown
0
gravatar for zoukai3412085
26 days ago by
zoukai34120850 wrote:

Hi,

In the Nat Prot paper about Ballgown, it's said as follow:

Note that Ballgown’s statistical test is a standard linear model-based comparison. For small sample sizes (n < 4 per group), it is often better to perform regularization. This can be done using the limma package in Bioconductor.

 

How this can be done? First, do "regularization" and "normalization" has the same meaning? i'm not deep in statistics? And second, how should the step(regularization) be taken during the ballgown procedure as this paper? I am a noob and hurried, who can help me and provide a detailed command? Thank very much!

ADD COMMENTlink modified 26 days ago by Aaron Lun17k • written 26 days ago by zoukai34120850
2
gravatar for Aaron Lun
26 days ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

First, do "regularization" and "normalization" has the same meaning?

No. Normalization refers to the removal of uninteresting technical biases prior to further analyses. For differential analyses, this usually refers to removal of biases between samples (e.g., differences in sequencing depth, composition biases), to ensure that comparisons between samples are not confounded. "Regularization", in this context, presumably refers to the empirical Bayes shrinkage of variance estimates within limma, which reduces the uncertainty of the estimates in the presence of limited replicates. The two are separate procedures, but both are necessary in DE analyses.

And second, how should the step(regularization) be taken during the ballgown procedure as this paper?

While I'm not familiar with the Ballgown output, I assume you would get some matrix of expression values. If this is a matrix of counts for each gene or transcript, you can apply the voom-limma pipeline as described elsewhere:

https://www.bioconductor.org/help/workflows/RNAseq123/

If all that you have are CPMs, you could log-transform them and use the limma-trend approach. The same can be done if you have FPKMs, but this will be less accurate (as the mean-variance relationship will probably be distorted by gene-length normalization):

A: Differential expression of RNA-seq data using limma and voom()

I am a noob and hurried, who can help me and provide a detailed command?

Read the documentation:

http://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf

Don't expect others to do your analysis for you. Software might be free but manpower isn't.

ADD COMMENTlink modified 26 days ago • written 26 days ago by Aaron Lun17k

Thank you so much!

ADD REPLYlink written 25 days ago by zoukai34120850

I agree with Aaron - use Limma instead as it is much better documented and have stood the test of time - it still performs very good in benchmarks. And if possible use count data as input as it better conveys the uncertainty of the analysis than RPKM values (as you need for Ballgown).

ADD REPLYlink written 23 days ago by k.vitting.seerup20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 298 users visited in the last hour