Question

Advice on normalization with metagenomics data

0

Entering edit mode

David ▴ 860

@david-3335

Last seen 6.1 years ago

Hello,

I´m using the metagenomSeq package to normalize my 16S data for one experiment with several samples . Here below are the log2 boxplots of the data after normalization ( i have normalized at the genus level). It looks that the normalization has worked pretty well. I have tried to check the data with qqnorm (2nd graph below).In the qqnorm graph the data does not look very normal ???? I still see a lot of values close to (>0) , i assume these are basically singletons.

I´m just wondering if this is what you expect from such metagenomics data and if i can apply normality tests (such as anova for eaxmple) to compare my groups or i should stick to the suggested methods in metagenomeSEQ for gorup comparisons. How can i control my data has properly been normalized. Thanks for your advice.

metagenome normalization metagenomics • 1.9k views

ADD COMMENT • link 8.0 years ago David ▴ 860

score 0 · Answer 1 · 2016-05-06

Are you thinking that 'normalize' should make your data normally distributed? If so, that's not the case. All a normalization is intended to do is remove as much technical variability between samples as possible, so you can then compare between samples without picking up uninteresting things about how the data were processed.

In other words, the Q-Q plot that you show is to be expected. Count data are not normally distributed, and ecological count data tend to be zero inflated (meaning you get lots of zeros, which may indicate that the species in question wasn't there, or maybe that it was there, but you just didn't count it). The statistics that metagenomeSeq uses are intended to work correctly, given those limitations of the data, whereas a 'regular' linear model is not.

score 0 · Answer 2 · 2016-05-06

Thanks James,

Thanks so much for the clarification. I think i understand the meaning of zero inflated now. It was just there but just needed some clarifications. I guess that not normal methods should be use to move forward starting with the methods that metagenomeSeq provides.

How do you know if the normalization has worked properly ?