missing ENTREZID-ENSEMBL matching in the Rattus.norvegicus 1.3.1 Bioconcutor annotation package
1
0
Entering edit mode
jambroise83 ▴ 10
@jambroise83-15255
Last seen 6.1 years ago

When I run the following code to obtain the corresponding symbol and the entrezid of the ENSRNOG00000048449 gene, I obtain no result:

library(Rattus.norvegicus)
ENSEMBL <- keys(Rattus.norvegicus,keytype = 'ENSEMBL')
annotation <- select(Rattus.norvegicus,keytype='ENSEMBL',keys=ENSEMBL,columns = c('SYMBOL','ENTREZID'))
annotation[annotation$ENSEMBL=='ENSRNOG00000048449',]

 

When you search on ensembl website, ENSRNOG00000048449 corresponds to itgb3.

Is it possible that the Rattus.norvergicus bioconcutor annotation package is not complete?

Many Thanks

Jérôme Ambroise

 

rattus norvegicus • 715 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 7 hours ago
United States

It's not missing. You are trying to map between NCBI and EBI/EMBL for a gene that they don't agree on. If we get the latest EnsDb object from the AnnotationHub and query on the chromosomal position and corresponding Entrez Gene ID, we get this:

> library(AnnotationHub
> hub <- AnnotationHub()
> ens <- hub[["AH57792"]]
> select(ens, 'ENSRNOG00000048449', c('GENENAME','ENTREZID','GENESEQSTART','GENESEQEND'), 'GENEID')
              GENEID GENENAME ENTREZID GENESEQSTART GENESEQEND
1 ENSRNOG00000048449    Itgb3       NA     92667869   92783410

And now if we use the Rattus.norvegicus package, which is based on UCSC mappings:

> select(Rattus.norvegicus, "Itgb3", c("ENTREZID","ENSEMBL","CDSSTART","CDSEND"), "SYMBOL")
'select()' returned 1:many mapping between keys and columns
   SYMBOL ENTREZID ENSEMBL CDSSTART   CDSEND
1   Itgb3    29302    <NA> 92424555 92424630
2   Itgb3    29302    <NA> 92452529 92452726
3   Itgb3    29302    <NA> 92453746 92453998
4   Itgb3    29302    <NA> 92457298 92457460
5   Itgb3    29302    <NA> 92458061 92458222
6   Itgb3    29302    <NA> 92460315 92460410
7   Itgb3    29302    <NA> 92461004 92461093
8   Itgb3    29302    <NA> 92462134 92462268
9   Itgb3    29302    <NA> 92463523 92463953
10  Itgb3    29302    <NA> 92533717 92533836
11  Itgb3    29302    <NA> 92536763 92536929
12  Itgb3    29302    <NA> 92538791 92538856

So the two annotation groups agree that there is a gene called Itgb3, but they don't agree on where it is in the genome, so they don't have a mapping between their ID and that of the other annotation group. This is not uncommon, which is why I generally recommend that people stick with either NCBI/UCSC annotations or EBI/EMBL. There is very little profit in mixing and matching between the two.

ADD COMMENT

Login before adding your answer.

Traffic: 702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6