limma regularization on Ballgown
1
1
Entering edit mode
rodri2006 ▴ 10
@rodri2006-13159
Last seen 6.9 years ago

Hi,

In the Nat Prot paper about Ballgown, it's said:

Note that Ballgown’s statistical test is a standard linear model-based comparison. For small sample sizes (n < 4 per group), it is often better to perform regularization. This can be done using the limma33 package in Bioconductor.

 

So, how this can be done? First,by regularization, does it mean a normalization or something more complicated (i'm not deep in statistics)? And second, how can the 'regularized' new data be included in the ballgown object?

When I try this:

bg = ballgown(dataDir = "ballgown", samplePattern = "mRNA", pData=pheno_data, meas="all")
texpr(bg) <- normalizeBetweenArrays(texpr(bg))

I got an error because I cannot modify the transcript matrix (and also as I said, I don't know if normalizeBetweenArrays is the proper regularization, I guess no)

 

Thanks!

 

ballgown limma • 1.9k views
ADD COMMENT
0
Entering edit mode

Hi! I'm trying to do the same, could you solve it? Thanks! :D

ADD REPLY
0
Entering edit mode
Alyssa Frazee ▴ 210
@alyssa-frazee-6710
Last seen 3.5 years ago
San Francisco, CA, USA

The idea behind the sentence in the paper about performing regularization for small sample sizes is basically a suggestion to use the expression data from ballgown as the inputs to limma. limma requires you to generate an expression matrix, and if you would like to use their methods, you would use ballgown to get that expression matrix (with texpr, or something similar), but then from there on, you'd use the limma package. (there is no "texpr <- " method in ballgown; texpr is the core of the ballgown object so you cannot overwrite it. If you would like regularized data, you should store it as a separate object which would then be passed to the appropriate limma functions). 

ADD COMMENT
0
Entering edit mode

Hi Alyssa,

I am a bit confused how to go from stringtie estimated expression per transcript to limma via ballgown (I am working with triplicates and trying to do some differential expression analysis at the transcript level; so, as the ballgown documentation advises, I will use limma).

In the limma vignette it says:

"In the limma approach to RNA-seq, read counts are converted to log2-counts-per-million (logCPM)"

I am reading into a ballgown object the tables created by Stringtie with transcript abundances with the meas=all option.

Now, if I want to prepare the data to input in limma, how can I get the appropriate "raw" read counts per transcript? The texpr function allows to output FPKM or coverage, which is not what limma expects, as far as I understand.

Could you guide me?

Thank you!

 

 

ADD REPLY

Login before adding your answer.

Traffic: 947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6