Question

Limit on significant genes for TMM normalization

0

Entering edit mode

bilcodygm • 0

@bilcodygm-14802

Last seen 6.2 years ago

I have done analysis on a dataset in which control samples were compared with treated samples. It is a small pilot which served to compare 2 technologies (Nanostring and Edgeseq). This is basically RNAseq data, so counts. What I have done is use TMM with quasi-likelihood testing and on the other hand quantile normalization with limma testing to look for significant genes in the contrast 'treated - control'. As far as I can judge, just usual pipelines for this kind of data.

The assumption for TMM is that the majority of the genes are not differentially expressed, for the quantile normalization that is not strictly the assumption, but it does assume, that samples have identical/similar data distributions and that global differences are due to technical variation.

What I find is that about 200-300 (I have used 2 pipelines, see above) of the panel of 460 genes are significantly changing in 'treated - control' That is quite a lot, I thought.

Can I still use the TMM normalization? Or is it robust enough to accomdate as low as 25% non-changing genes? Is there a sensible limit for the proportion of genes that should -not- change significantly?

Then ofcourse, I start wondering about the quantile normalization as well, because the control data distribution may be different from the treated data distribution, although I do not see that from the boxplots. So, I would say, quantile is fine to use.

Many thanks for your help and advice on this!

normalization quantile normalization tmm edger • 1.1k views

ADD COMMENT • link updated 6.2 years ago by Gordon Smyth 50k • written 6.2 years ago by bilcodygm • 0

score 0 · Answer 1 · 2018-02-22

TMM and quantile normalization are not substantially different regarding the number of DE genes that they can tolerate.

Neither method is intended to tolerate 75% DE genes. You already know this, as you have correctly stated that TMM assumes the majority of genes to be non-DE.

However normalizing is still better than not normalizing, even with 75% DE genes. And the results might be good enough if up and down genes are more or less equally represented.

Quantile is the more robust of the two, as it normalizes for nonlinear effects as well as simple scaling. But TMM is a bit better at tolerating DE changes that are unbalanced, predominately in one direction.

When you compare two different technologies, I'd expect all the genes to be DE. So the aim should be to quantify the size of the differences rather than to test formally for DE.