Is org.Mm.eg.db updated to mus musculus version 99?
1
0
Entering edit mode
kavator ▴ 20
@kavator-22955
Last seen 7 months ago

Hi folks using org.Mm.eg.db I previously did my pseudoalignment with ftp://ftp.ensembl.org/pub/release-99/gtf/musmusculus/Musmusculus.GRCm38.99.gtf.gz But it seems like org.Mm.eg.db is not updating to that version of Mm's ensembl?

tried both

res$symbol <- mapIds(org.Mm.eg.db,
+                      keys=ens.str,
+                      column="SYMBOL",
+                      keytype="ENSEMBL",
+                      multiVals="first")



    res$symbol <- mapIds(org.Mm.eg.db,
+                      keys=ens.str,
+                      column="SYMBOL",
+                      keytype="MGI",
+                      multiVals="first")

both gave me

Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'MGI'. Please use the keys method to see a listing of valid arguments

my countdata bears the ENSMUSG prefixes

org.Mm.eg.db AnnotationDbi • 377 views
ADD COMMENT
1
Entering edit mode
Johannes Rainer ★ 1.9k
@johannes-rainer-6987
Last seen 3 months ago
Italy

How do your IDs look like? Can you provide a head(ens.str)?

Note that if you're using Ensembl annotation you can also use the ensembldb EnsDb annotation resources. These contain all gene, transcript, exon and protein annotations provided from Ensembl. You can get them (for any species and any Ensembl release) from AnnotationHub. To get the one you want (Mus musculus, Ensembl version 99):

> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2019-10-29
> query(ah, "EnsDb.Mmusculus.v99")
AnnotationHub with 1 record
# snapshotDate(): 2019-10-29 
# names(): AH78811
# $dataprovider: Ensembl
# $species: Mus musculus
# $rdataclass: EnsDb
# $rdatadateadded: 2019-10-29
# $title: Ensembl 99 EnsDb for Mus musculus
# $description: Gene and protein annotations for Mus musculus based on Ensem...
# $taxonomyid: 10090
# $genome: GRCm38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("99", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",
#   "Protein", "Transcript") 
# retrieve record with 'object[["AH78811"]]' 
> edb <- ah[["AH78811"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache
require(“ensembldb”)
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.5
|Creation time: Wed Feb  5 00:04:44 2020
|ensembl_version: 99
|ensembl_host: localhost
|Organism: Mus musculus
|taxonomy_id: 10090
|genome_build: GRCm38
|DBSCHEMAVERSION: 2.1
| No. of genes: 56289.
| No. of transcripts: 144726.
|Protein data available.

You can then use the variable edb then instead of the org.Mm.eg.db. Have also a look at the ensembldb vignettes for more information (browseVignette("ensembldb")).

cheers, jo

ADD COMMENT
0
Entering edit mode

Hi Jo, thanks for the tip on building self annotation! when i do head(ens.str)

output is

[1] "ENSMUSG00000000001" "ENSMUSG00000000028" "ENSMUSG00000000037" [4] "ENSMUSG00000000049" "ENSMUSG00000000056" "ENSMUSG00000000058"

with "Mm" variable as your "edb"

res$symbol <- mapIds(Mm,keys=ens.str,column="SYMBOL",keytype="MGI",multiVals="first")

i still got the same output. I tried ENSEMBL and MGI as my keytype but i get the same output.

keytypes(Mm) [1] "ENTREZID" "EXONID" "GENEBIOTYPE"
[4] "GENEID" "GENENAME" "PROTDOMID"
[7] "PROTEINDOMAINID" "PROTEINDOMAINSOURCE" "PROTEINID"
[10] "SEQNAME" "SEQSTRAND" "SYMBOL"
[13] "TXBIOTYPE" "TXID" "TXNAME"
[16] "UNIPROTID"

is there an example of each of the keys? i tried?TXID but it doesnt show

ADD REPLY
0
Entering edit mode

the keytype specifies the type of your input identifiers (i.e. ens.str). In your case you have to choose keytype = "GENEID" for the EnsDb database as your identifiers are Ensembl gene identifiers. Note that alternatively you could also use the code below to get all gene-related annotations from the EnsDb database. After retrieving you will also have to re-order the data frame to match your input identifiers (last line of the example code below).

ann <- genes(Mm, filter = ~ gene_id %in% ens.str, return.type = "data.frame")
rownames(ann) <- ann$gene_id
ann <- ann[ens.str, ]
ADD REPLY
0
Entering edit mode

Thanks alot for all the tips and advice Jo! :)

ADD REPLY
0
Entering edit mode

Thanks alot for all the tips and advice Jo! :)

ADD REPLY

Login before adding your answer.

Traffic: 208 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6