removing genes before RNA-seq normalization
aec
Last seen 17 months ago

Dear all,

Removing genes manually before RNA-seq normalization is not a good practice, right? For example, we would like to investigate osteoblast expression from bone samples, but we know that there is some contamination from muscle. Is it correct to remove the 'muscle' genes before normalization? I understand this should not be done because TMM normalization corrects for library size and compositional biases. Imagine that some bone samples are more contaminated than others, and one has an extremely high expression of muscle genes. If we compare two different conditions and remove the contaminating transcripts before normalization, we would obtain untrustful results, right?

Another example would be removing all non-coding genes beforehand if we want to study protein-coding genes, only. The same applies?


Last seen 4 hours ago
WEHI, Melbourne, Australia

It is up to you to determine what "universe" of genes you want to consider and, unless you remove most of the genome, it doesn't cause any problems for TMM or edgeR.

You can consider protein coding genes only if you want, or only somatic chromosomes, or only messenger RNA, or only microRNAs, whatever is biologically appropriate. You just have to describe what you did when you publish.

If you can unambiguously identify "contaminating" genes, then you can remove them as well. Again, you have to explain what you did and why.

The only thing you can't do is to remove genes and recompute library sizes after applying TMM normalization.

Every one of my own published papers explains which genes were removed before normalization. Just to take the most recent (Vrahnas et al, Nature Communications 2019), we said

Immunoglobulin gene segments, ribosomal genes, predicted and pseudo genes, sex-linked genes (Y chromosome and Xist), and obsolete Entrez Gene IDs were filtered out.

Thanks Gordon, then I was wrong. I thought one can not modify the RNA composition of the sample bioinformatically before the normalization.


