I have a list of ~31,000 mouse transcripts with their Ensembl transcript IDs that I'm trying to annotate using AnnotationDbi and the org.Mm.eg.db database. R v3.3.2, the object containing the IDs is called "temp". My command is:
mapIds(org.Mm.eg.db, keys=row.names(temp), keytype="ENSEMBLTRANS", column="SYMBOL", multiVals="first")
Only ~8500 get annotated with a gene name/symbol while the rest get "NA". If I search some of the NAs on Ensembl they match to transcripts/genes correctly. Some examples:
Transcript ID from "temp" | Result from mapIDs | Link to Ensembl record |
ENSMUST00000000001 | Gnai3 | Ensembl |
ENSMUST00000000028 | NA | Ensembl |
ENSMUST00000000049 | NA | Ensembl |
ENSMUST00000000058 | Cav2 | Ensembl |
You can see that even the 2 NAs have Ensembl transcript records so why are they not getting annotated by AnnotationDbi?
The command also outputs this, which I'm not sure is relevant or something to worry about:
'select()' returned 1:many mapping between keys and columns
Thank you it's working much better now but still missing about 10% of them. Actually it's matching everything up to ENSMUST00000195885 and getting NA for all subsequent transcript IDs, here are the 10 around ENSMUST00000195885:
Command is:
Also one of the results is now blank: ENSMUST00000077235
When using org.Mm.eg.db it correctly finds Dhrsx (Ensembl link).
If you want more recent transcripts, you need to use a more recent version of the Ensembl database. The version that Johannes provides is based on Ensembl V79 (hence the v79 in the name), which is rather old. Biomart is based on the current version:
And if we check an archived version 79
The reason for the missing entry might be that in Ensembl version 79 the transcript/gene was not annotated yet to that symbol. Locally I have EnsDb.Mmusculus.v87 and there it is annotated to DHRSX.
You beat me by 3 minutes James ;)
Mike, if you need the new EnsDb just drop me a line.
cheers, jo
Hi Johannes,
Any chance you can make EnsDb.Mmusculus.v87 available through the Bioconductor annotation pages? Or any other way? For my data set I (also) would like to make use of the latest annotation info available. :)
Thanks,
Guido
Actually, with the current development version it would be possible to get
EnsDb
for all species from Ensembl 87 fromAnnotationHub
As said, that's in the developmental BioC (version 3.5), so, not yet officially available.
In the meantime you can download the corresponding SQLite file from https://cloud.scientificnet.org/index.php/s/q4vZQ1pq96Hl6sq - but beware - download will be slow. You can use then the corresponding
EnsDb
by using theEnsDb
function passing the file name of the SQLite file as argument (full path).Thanks! Will go for the 1st option!