Question

Struggling to convert a large list of non-model genes into human orthologs, any suggestions?

0

Entering edit mode

ronin • 0

@73e6e70d

Last seen 1 day ago

Estonia

I am working with Perca fluviatilis and have a large (many thousands) list of genes. I would like to convert these into human orthologs. If I have a list that looks something like:

PFLUV_G00277780
PFLUV_G00269580
PFLUV_G00217690
PFLUV_G00013790
PFLUV_G00218480
PFLUV_G00127550
PFLUV_G00171730
PFLUV_G00002260
PFLUV_G00161260
PFLUV_G00274260

Is there any resource available that can convert these into human genes? For a few dozen it is easy to just search the gene name on NCBI, but I am dealing with thousands, so I cannot do this manually. Thanks in advance for any suggestions or tips you might have.

annotation • 436 views

ADD COMMENT • link updated 2 days ago by James W. MacDonald 66k • written 2 days ago by ronin • 0

score 1 · Answer 1 · 2024-06-27

Normally I would suggest using the Orthology.eg.db package, which you can use to map between two species. Unfortunately you have what NCBI calls 'LocusTags' rather than the usual NCBI Gene ID, and there isn't an easy way that I know of to map the LocusTags to Gene IDs. There is a hard way to do it, using NCBI's efetch utilities, which are very powerful but IMO not intuitive at all. Anyway, there is a set of utilities at NCBI that you can get. And once you have done so, you can craft a super-obvious query:

esearch -db gene -query "txid8168 [orgn]"  | 
efetch -format docsum | 
xtract -set Set -rec Rec -pattern DocumentSummary -block DocumentSummary \
-pkg Common -wrp ID -element Id -wrp Locus -element OtherAliases | 
xtract -pattern Rec -def "-" -element ID Locus > tmp.txt

I mean obviously. And then you will have

$ head tmp.txt
120556064   PFLUV_G00038130
120561572   PFLUV_G00088970, TNF-a
120555793   PFLUV_G00011040
120549074   PFLUV_G00233340, HIF-1a
120548044   PFLUV_G00223160, ALOX5
120547785   PFLUV_G00224060
120566625   PFLUV_G00121520
120560661   PFLUV_G00075320, PTGES2
120553616   PFLUV_G00260310, LTAH4

Which you can use to map your LocusTags to NCBI Gene IDs, after which you can use the Orthology.eg.db package to map to human NCBI Gene IDs

> library(Orthology.eg.db)
> z <- read.table("tmp.txt", header = FALSE, sep = "\t")
> head(z)
         V1                      V2
1 120556064         PFLUV_G00038130
2 120561572  PFLUV_G00088970, TNF-a
3 120555793         PFLUV_G00011040
4 120549074 PFLUV_G00233340, HIF-1a
5 120548044  PFLUV_G00223160, ALOX5
6 120547785         PFLUV_G00224060
> select(Orthology.eg.db, as.character(z[1:20,1]), "Homo.sapiens","Perca.fluviatilis")
   Perca.fluviatilis Homo.sapiens
1          120556064         3291
2          120561572           NA
3          120555793         4306
4          120549074         3091
5          120548044          240
6          120547785        60481
7          120566625           NA
8          120560661        80142
9          120553616         4048
10          22976156           NA
11          22976155           NA
12          22976154           NA
13          22976153           NA
14          22976152           NA
15          22976151           NA
16          22976150           NA
17          22976149           NA
18          22976148           NA
19          22976147           NA
20          22976146           NA