Unexpected gene polymorphism using Salmon-tximeta-DESeq2
Entering edit mode
Ray • 0
Last seen 3 days ago

We're analyzing RNAseq data with a pipeline consisting of Salmon, tximeta, and DESeq2.

We have a multi-factorial experimental design, and the experiment was performed on cell lines.

On thing that surprised us is that in the result output, we observe many gene polymorphisms.

For example, for gene NLRP2 we observed multiple entries associated with different ensembl IDs ENSG00000022556, ENSG00000275082, ENSG00000275843, etc.

Entries of NLRP2 from one particular RNAseq experiment result

My question is how do we interpret data like this? And how to deal with this kind of situation? Can we add/average different entries associated with the same gene?

tximeta DESeq2 • 41 views
Entering edit mode
Last seen 2 days ago
United States

This is a consequence of the transcriptome you used for quantification. I recommend that people working with human data use GENCODE reference transcripts, because it does not duplicate genes on haplotype chromosomes (which Ensembl does for its transcripts FASTA files). See the chromosome for the genes other than the first, they are listed as "Chromosome CHR_HSCHR19..." which is a haplotype of chr19.

Another reason is that GENCODE provides a single file, while for Ensembl you need to combine the cDNA and ncRNA files to produce a transcriptome.


Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3