Question

How does normalisation affect the outcome of the Limma t-test like test (using the eBayes() function)?

0

Entering edit mode

Regeroka • 0

@regeroka-20875

Last seen 4.9 years ago

Hi all,

I am using the limma t-test on RNA-seq data, to compare the expression profiles between 2 conditions. I get different results when using log-expression values to when I'm using (z-score) normalised log-expression values.

Do you know why that could be, how scaling could affect a t-test? Should I use the scaled or unscaled log-expression data?

Thank you so much for your help!

limma • 1.4k views

ADD COMMENT • link updated 4.9 years ago by Gordon Smyth 50k • written 4.9 years ago by Regeroka • 0

score 2 · Answer 1 · 2019-05-27

2

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Yes, you should input unscaled log-expression values to limma, not z-scores. One of the major purposes of limma is to gain statistical power by modelling the variances. If you divide out the variances, then you prevent limma from knowing what the true variances are. Another disadvantage of z-scores is that they bias the log-fold-change estimates.

ADD COMMENT • link 4.9 years ago Gordon Smyth 50k

0

Entering edit mode

Thank you for your answer! How about using gene signatures? (e.g. taking all genes related to a certain pathway, and average their expression, and then repeat for multiple pathways. For that, the expression has to be scaled, so the individual genes are comparable.) Would this also cause an issue?

ADD REPLY • link 4.9 years ago Regeroka • 0

0

Entering edit mode

limma provides the roast function for analysing pathways, and the expression values do not need to be scaled.

Simply averaging expression for each pathway seems too simple to me. If you do choose to analyse data in that way, then yes it does complicate the variance modelling.

ADD REPLY • link 4.9 years ago Gordon Smyth 50k

0

Entering edit mode

Sorry, my previous comment was not the best put. What I meant with (or rather instead of) using pathways, is that the genes (whose expression is to be averaged) are co-expressed, or we have a reason to believe so. So you would end up with a matrix of gene signatures (e.g. for pathways, or biological processes), rather than a matrix of gene expressions. I think the setting is almost the same, but could I ask if you would still recommend roast() for that?

ADD REPLY • link 4.9 years ago Regeroka • 0

0

Entering edit mode

I already understood your question and, yes, I recommend roast for analysing co-regulated sets of genes.

If you want to ask any more questions about gene signatures, then please post a new question. I have answered you original question.