Question: removing genes before RNA-seq normalization
0
gravatar for aec
4 weeks ago by
aec50
aec50 wrote:

Dear all,

Removing genes manually before RNA-seq normalization is not a good practice, right? For example, we would like to investigate osteoblast expression from bone samples, but we know that there is some contamination from muscle. Is it correct to remove the 'muscle' genes before normalization? I understand this should not be done because TMM normalization corrects for library size and compositional biases. Imagine that some bone samples are more contaminated than others, and one has an extremely high expression of muscle genes. If we compare two different conditions and remove the contaminating transcripts before normalization, we would obtain untrustful results, right?

Another example would be removing all non-coding genes beforehand if we want to study protein-coding genes, only. The same applies?

Thanks,

ADD COMMENTlink modified 4 weeks ago by Gordon Smyth39k • written 4 weeks ago by aec50
Answer: removing genes before RNA-seq normalization
1
gravatar for Gordon Smyth
4 weeks ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

It is up to you to determine what "universe" of genes you want to consider and, unless you remove most of the genome, it doesn't cause any problems for TMM or edgeR.

You can consider protein coding genes only if you want, or only somatic chromosomes, or only messenger RNA, or only microRNAs, whatever is biologically appropriate. You just have to describe what you did when you publish.

If you can unambiguously identify "contaminating" genes, then you can remove them as well. Again, you have to explain what you did and why.

The only thing you can't do is to remove genes and recompute library sizes after applying TMM normalization.

Every one of my own published papers explains which genes were removed before normalization. Just to take the most recent (Vrahnas et al, Nature Communications 2019), we said

Immunoglobulin gene segments, ribosomal genes, predicted and pseudo genes, sex-linked genes (Y chromosome and Xist), and obsolete Entrez Gene IDs were filtered out.

ADD COMMENTlink modified 29 days ago • written 4 weeks ago by Gordon Smyth39k

Thanks Gordon, then I was wrong. I thought one can not modify the RNA composition of the sample bioinformatically before the normalization.

ADD REPLYlink written 28 days ago by aec50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 143 users visited in the last hour