Question

choose between normalization techniques for OTU counts

0

Entering edit mode

apfelbapfel ▴ 30

@apfelbapfel-15149

Last seen 5.3 years ago

Hello

After having noticed some spurious results in my data set I wanted to contact this expert community here to get help with choosing the right normalization approach for my data.

I have two groups, patients and healthy controls, where microbiota OTUs have been measured from biopsies: Quality filtering was performed using SDM software and default criteria parameter adapted to the 454 sequencing platform using the LotuS pipeline. High-quality and midquality sequences were mapped to count the occurrence of OTUs in a single sample and clustering was done with UPARSE. The OTU sequences were then taxonomically assigned using Greengenes database34 (3.8, August 2013) and RDP II database35 (release version 11).

Now I want to use this data to correlate to host mRNA expression, preferably using Spearmans Ranks.

The default procedure in my lab is to normalize for sequencing depth by calculating ratios, but I think that ratios are not the ideal way to test my hypothesis, so Im looking into more useful alternatives. Also I have quite a number of columns that are either sum-zero or have very low variance, so just calculating ratios might blow up noise overproportiannly.

From all the options out there I think that Deseq2 or TMM, cumulative sum scaling or just subsampling by number of reads (multiplying all of the entries by (#reads in smallest sample)/(#reads in this sample)) would be best.

The thing is that we have a very low number of observations (around 30 per group) give difficulties of obtaining these samples, so im a bit hesitant with Deseq2.

Any input regarding this question would be highly appreciated, thanks in advance!

deseq2 normalization R edger • 1.6k views

ADD COMMENT • link 6.1 years ago apfelbapfel ▴ 30

0

Entering edit mode

I don’t work with OTU analysis myself, so I’ll leave it to others to recommend analyses.

ADD REPLY • link 6.1 years ago Michael Love 41k

score 0 · Answer 1 · 2018-03-05

You should have a look to the metagenomeSeq package where there is a normalization procedure for this kind of data.

There are a couple of papers worth mentioning:

"Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible": About subsampling to the number of reads in 16S sequencing
"Proportionality: A Valid Alternative to Correlation for Relative Data": About using % or proportions with OTUs.

Also take into account how depth did you sequenced (your targeted 16S ?), be sure that you haven't undersampled to make valid conclusions.

score 0 · Answer 2 · 2018-03-06

0

Entering edit mode

apfelbapfel ▴ 30

@apfelbapfel-15149

Last seen 5.3 years ago

Thanks for the reply. Since i was on a very tight deadline i went with Deseq2 now, but will definitely check out your reply for next time!

ADD COMMENT • link 6.1 years ago apfelbapfel ▴ 30