Question

Does voom normalization quantile create difference in DGE genes when distribution of patients is similar (no big differences in quantiles)?

0

Entering edit mode

manuelsokolov • 0

@200b7413

Last seen 7 months ago

Portugal

I am working with RNA counts data. Where the log(counts +1) look like this:

enter image description here

Of course 1.28.1 is an outlier and it was removed before normalization.

And by doing the right approach with having Voom <- voom(RNA_data, design, plot = TRUE) the results were this, havinng in mind that group A has 50 patients, B 25 and C 25. So the comparison is 50 to 50.

enter image description here

By adding the Voom <- voom(RNA_data, design, plot = TRUE, normalize.method = "quantile") the results changed to this:

enter image description here

I have read different posts regarding this topic even that normalize.method = "quantile" is used standardly in the original paper of voom. But is it supposed to cause such impact on the results. Given the distribution of the expression is this supposed to happen?

limmaGUI normalization limma voom Normalization • 1.0k views

ADD COMMENT • link updated 9 months ago by Gordon Smyth 50k • written 10 months ago by manuelsokolov • 0

score 0 · Answer 1 · 2023-07-01

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Sorry, I don't follow what your question is about. Your question is apparently supposed to show a series of plots, but the plots aren't visible. The code used to make the plots also isn't included your question, so we don't know what the plots were intended to be about.

I will make these points:

You cannot learn anything from a plot of log(counts+1). RNA-seq counts have to be, at very least, normalized by library size to have any meaning.
Quantile is not the default normalization method in either edgeR or voom.
The original voom paper used both TMM and quantile normalization.

ADD COMMENT • link 10 months ago Gordon Smyth 50k

0

Entering edit mode

I am sorry, I have edited so that the images are visible, they are the main point of my question. I understand the log(counts+1) was only to remove the outliers before applying voom to make sure that when voom normalizes the data it does not normalize with outliers. And for both cases I filtered the genes in which counts are = 0 and applied edgeR::filterByExpr .

ADD REPLY • link 10 months ago manuelsokolov • 0

1

Entering edit mode

It would be better to use plotMDS to assess outlier samples rather than a plot of log(counts+1). A sample with lower counts might simply have a lower library size but might not be an outlier in terms of expression.

Anyway, yes, normalization is supposed to make a difference. That's why we recommend it! The DE results before normalization look extremely unbalanced and are unlikely to be reliable.

BTW, as I have said before on this forum, I do not recommend logFC cutoffs when assessing DE genes. I understand that it is common practice in the literature, but that doesn't make it good. Making an MD plot (plotMD) will give a better idea of the relationship between logFC and expression level.

ADD REPLY • link 10 months ago • updated 9 months ago Gordon Smyth 50k

0

Entering edit mode

One final question regarding this thread. Indeed, plotMDS had different results when comparing to log(counts +1) showing that the outlier wasn't really an outlier.

I did the plotMDS over the rna_counts data directly without normalizing. And this was the result:

enter image description here

Should the group from 1.18 to 1.36 (with five elements) be considered outlier? Additionally should it be removed before tmm + voom with normalize quantile ?

However the plot after rna_data -> tmm normalization and before voom is this one:

enter image description here

Or this way the only outlier is the that was considered before the 1.28.1. This way the plotMDS should be applied after tmm to discover outliers?

Thank you once gain!

ADD REPLY • link 9 months ago manuelsokolov • 0

0

Entering edit mode

A voom workflow that you can follow is shown here: RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR

All analysis including plotMDS should be done after normalization.
plotMDS is applied either to the DGEList object or to logCPM values rather than to a numeric matrix of counts. (The first MDS plot above is on the wrong scale, suggesting that has been applied to raw counts without any conversion to log-expression values. The difference between your two MDS plots is not just due to TMM normalization.)
Outliers should not be removed unless you have a causal explanation for them (otherwise you're cherry-picking the data).
I suggest you use sample weights. Any outliers will then be automatically downweighted in the analysis so there is no need to agonize about whether to remove them. Sample weights are implemented by voomWithQualityWeights or (more easily) by edgeR::voomLmFit with sample.weights=TRUE.

ADD REPLY • link 9 months ago Gordon Smyth 50k