Question

How to use limma to do regularization for Ballgown

0

Entering edit mode

zoukai3412085 • 0

@zoukai3412085-14416

Last seen 4.7 years ago

Hi,

In the Nat Prot paper about Ballgown, it's said as follow:

Note that Ballgown’s statistical test is a standard linear model-based comparison. For small sample sizes (n < 4 per group), it is often better to perform regularization. This can be done using the limma package in Bioconductor.

How this can be done? First, do "regularization" and "normalization" has the same meaning? i'm not deep in statistics? And second, how should the step(regularization) be taken during the ballgown procedure as this paper? I am a noob and hurried, who can help me and provide a detailed command? Thank very much!

limma rna-seq • 2.1k views

ADD COMMENT • link updated 7.7 years ago by Aaron Lun ★ 28k • written 7.7 years ago by zoukai3412085 • 0

score 3 · Answer 1 · 2017-11-18

First, do "regularization" and "normalization" has the same meaning?

No. Normalization refers to the removal of uninteresting technical biases prior to further analyses. For differential analyses, this usually refers to removal of biases between samples (e.g., differences in sequencing depth, composition biases), to ensure that comparisons between samples are not confounded. "Regularization", in this context, presumably refers to the empirical Bayes shrinkage of variance estimates within limma, which reduces the uncertainty of the estimates in the presence of limited replicates. The two are separate procedures, but both are necessary in DE analyses.

And second, how should the step(regularization) be taken during the ballgown procedure as this paper?

While I'm not familiar with the Ballgown output, I assume you would get some matrix of expression values. If this is a matrix of counts for each gene or transcript, you can apply the voom-limma pipeline as described elsewhere:

https://www.bioconductor.org/help/workflows/RNAseq123/

If all that you have are CPMs, you could log-transform them and use the limma-trend approach. The same can be done if you have FPKMs, but this will be less accurate (as the mean-variance relationship will probably be distorted by gene-length normalization):

A: Differential expression of RNA-seq data using limma and voom()

I am a noob and hurried, who can help me and provide a detailed command?

Read the documentation:

http://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf

Don't expect others to do your analysis for you. Software might be free but manpower isn't.