Unexpected gene polymorphism using Salmon-tximeta-DESeq2
1
0
Entering edit mode
Ray • 0
@Ray-24558
Last seen 5 months ago
Hong Kong

We're analyzing RNAseq data with a pipeline consisting of Salmon, tximeta, and DESeq2.

We have a multi-factorial experimental design, and the experiment was performed on cell lines.

On thing that surprised us is that in the result output, we observe many gene polymorphisms.

For example, for gene NLRP2 we observed multiple entries associated with different ensembl IDs ENSG00000022556, ENSG00000275082, ENSG00000275843, etc.

My question is how do we interpret data like this? And how to deal with this kind of situation? Can we add/average different entries associated with the same gene?

tximeta DESeq2 • 742 views
0
Entering edit mode
@mikelove
Last seen 54 minutes ago
United States

This is a consequence of the transcriptome you used for quantification. I recommend that people working with human data use GENCODE reference transcripts, because it does not duplicate genes on haplotype chromosomes (which Ensembl does for its transcripts FASTA files). See the chromosome for the genes other than the first, they are listed as "Chromosome CHR_HSCHR19..." which is a haplotype of chr19.

Another reason is that GENCODE provides a single file, while for Ensembl you need to combine the cDNA and ncRNA files to produce a transcriptome.

0
Entering edit mode

Thanks so much for the clarification Michael.

I was indeed confused by the alternative scaffolds included in ensembl genome.

Now that you've mentioned it, I will rebuild salmon index with GENCODE reference transcriptome.

0
Entering edit mode

Oh and a further recommendation, when you use Salmon to index, specify --gencode which will clean the transcript names in the Salmon output.

0
Entering edit mode

Thank you!!

Indeed I included the --gencode flag by following a tutorial from here https://biocorecrg.github.io/RNAseq_course_2019/salmon.html :)

Right now I'm trying to extract some extra information (i.e. gene symbol, description, etc.) from the rowRanges slot. When I was using ensembl genome reference, these were automatically appended to the SummarizedExperiment object from AnnotationHub, but with GENCODE genome these information were missing.

I've tried the makeLinkedTxome() function to link a local gencode gtf file but it didn't seem to work.

Now I'm reading this vignette https://biodatascience.github.io/compbio/bioc/SE.html to see if I can add these back directly from the gencode gtf file. Any suggestions?

0
Entering edit mode

Have you tried addIds from tximeta package?

0
Entering edit mode

Just tried addIds and it worked, thanks a lot Michael!