Mapping LOC genes between species using Orthology.eg.db (or a similar tool)
1
1
Entering edit mode
@d24570fc
Last seen 10 months ago
United States

I'm trying to map a genelist of NCBI Gene IDs from Scomber japonicus (Chub mackerel) to Danio rerio (Zebrafish) for further analysis with tools tailored to Danio that are not available for S japonicus.

I noticed that while Orthology.eg.db is able to convert most Scomber japonicus genes to NCBI Gene IDs, it fails to map LOC genes between species. For my analysis, I don't necessarily care that these genes aren't actually orthologs and that in both species these genes are LOC genes. I am just interested in programmatically converting my genelist of NCBI Gene IDs between the two species while capturing LOC to LOC conversions in addition to ortholog mapping. Is there a way to get around this limitation of Orthology.eg.db for LOC genes or is there another tool that I can use since they are not strictly orthologs?

Example using a genelist of NCBI Gene IDs:

genelist <- c(
  "128378700", "128369139", "128358931", "128359788", "128380930",
  "128362230", "128368409", "128375657", "128355604", "128369136"
)

mappings <- AnnotationDbi::select(
  Orthology.eg.db,
  keys = genelist,
  "Danio.rerio",
  "Scomber.japonicus"
)

This is a screenshot of the results from my example with some annotation added

screenshot_of_results

Thank you for taking the time to answer and sorry if this is a dumb question!

Orthology.eg.db • 886 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States

It's not a dumb question.

The LOC genes are things that NCBI thinks might be genes, but are unsure of. They are sure enough to give them an NCBI Gene ID, but since it's uncharacterized, they don't give it a name, and instead just append LOC to the front of the gene ID. The Orthology.eg.db package is mapping between the NCBI Gene IDs for the two species though, so if there is a known orthologous gene in S. japonicus, you should get the mapping.

But it's unlikely for the LOC genes, because of how the orthologs are identified. In this case, transcripts from Scomber japonicus were sequenced using PacBio, and then assembled into full length transcripts using HiFiasm. The transcripts are then identified by comparing to a model organism (maybe D. rerio? I don't know). But usually the transcripts are only identified when they are similar to an existing, known transcript, so if there were a transcript in S. japonicus that is very similar to an uncharacterized LOC gene in D. rerio, it wouldn't be annotated because it's not informative.

ADD COMMENT

Login before adding your answer.

Traffic: 476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6