I was trying to do some comparative analysis between the rhesus and human genes. I seem to get no genes that are common to both species! What am I doing wrong? My code, screen output :





txdb_hg38 <- TxDb.Hsapiens.UCSC.hg38.knownGene
txdb_rh8 <- TxDb.Mmulatta.UCSC.rheMac8.refGene

## All rhesus entrez ids

grgenes <- genes(txdb_rh8)
allrefs <- grgenes$gene_id 
monkey_entrez <- sort(mapIds(, keys=allrefs, column="ENTREZID", keytype="REFSEQ", multiVals="filter"))

# homo sapiens entrez ids
grgenes <- genes(txdb_hg38)
human_entrez <- unique(grgenes$gene_id)

> length(monkey_entrez)
[1] 6371
> length(human_entrez)
[1] 24183
> length(intersect(monkey_entrez,human_entrez))
[1] 0
> head(monkey_entrez)
NM_001098400 NM_001105170 NM_001104552 NM_001105171 NM_001105172 NM_001105173 
 "100049578"  "100125558"  "100125559"  "100125560"  "100125562"  "100125563" 
> head(human_entrez)
[1] "1"         "10"        "100"       "1000"      "100009613" "100009676"




Also, how can I get mapIds to return all matches (and not just 1:1)? I tried all the options for multiVals, but they all return the same number of entrez genes. • 180 views
You seem to think that Gene IDs are species agnostic. They are not. A given Gene ID is for a gene found in a particular species.

If you want to map between species, you can use the biomaRt package, particularly the getLDS function.

