How to use limma to do regularization for Ballgown
1
0
Entering edit mode
@zoukai3412085-14416
Last seen 4.1 years ago

Hi,

In the Nat Prot paper about Ballgown, it's said as follow:

Note that Ballgown’s statistical test is a standard linear model-based comparison. For small sample sizes (n < 4 per group), it is often better to perform regularization. This can be done using the limma package in Bioconductor.

 

How this can be done? First, do "regularization" and "normalization" has the same meaning? i'm not deep in statistics? And second, how should the step(regularization) be taken during the ballgown procedure as this paper? I am a noob and hurried, who can help me and provide a detailed command? Thank very much!

limma rna-seq • 1.9k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

First, do "regularization" and "normalization" has the same meaning?

No. Normalization refers to the removal of uninteresting technical biases prior to further analyses. For differential analyses, this usually refers to removal of biases between samples (e.g., differences in sequencing depth, composition biases), to ensure that comparisons between samples are not confounded. "Regularization", in this context, presumably refers to the empirical Bayes shrinkage of variance estimates within limma, which reduces the uncertainty of the estimates in the presence of limited replicates. The two are separate procedures, but both are necessary in DE analyses.

And second, how should the step(regularization) be taken during the ballgown procedure as this paper?

While I'm not familiar with the Ballgown output, I assume you would get some matrix of expression values. If this is a matrix of counts for each gene or transcript, you can apply the voom-limma pipeline as described elsewhere:

https://www.bioconductor.org/help/workflows/RNAseq123/

If all that you have are CPMs, you could log-transform them and use the limma-trend approach. The same can be done if you have FPKMs, but this will be less accurate (as the mean-variance relationship will probably be distorted by gene-length normalization):

A: Differential expression of RNA-seq data using limma and voom()

I am a noob and hurried, who can help me and provide a detailed command?

Read the documentation:

http://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf

Don't expect others to do your analysis for you. Software might be free but manpower isn't.

ADD COMMENT
0
Entering edit mode

Thank you so much!

ADD REPLY
0
Entering edit mode

I agree with Aaron - use Limma instead as it is much better documented and have stood the test of time - it still performs very good in benchmarks. And if possible use count data as input as it better conveys the uncertainty of the analysis than RPKM values (as you need for Ballgown).

ADD REPLY

Login before adding your answer.

Traffic: 546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6