Question

Error when annotating ENSEMBLE transcript IDs from chicken annotation DB

0

Entering edit mode

Mohamed ▴ 30

@aa1ae679

Last seen 2.4 years ago

United Kingdom

I have been trying to annotate the gene symbol for the row name of my data frame H3com using the command:

H9comm$Gene_SYMPOL<- mapIds(org.Gg.eg.db, keys = rownames(H9comm), keytype = 'ENSEMBLTRANS', column = 'SYMBOL')

It pops with an error

Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENSEMBLTRANS'. Please use the keys method to see a listing of valid arguments.

I do not know why, I doubled check that my row names are ensemble transcript ID from chicken, which should match the key.types. My row names looks like this My data frame

Do you think this could be due to any incompitabilities between bioconductor and annotation DB versions ? My Bioconductor version is :

> tools:::.BioC_version_associated_with_R_version()

1 ‘3.15’

and I have installed the annotation DB package from Bioconductor website using

if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("AnnotationDbi")

Any help would be appreciated .. Thanks in advance

R ensembldb Annotation • 3.6k views

ADD COMMENT • link updated 3.3 years ago by Guido Hooiveld ★ 4.1k • written 3.3 years ago by Mohamed ▴ 30

score 0 · Answer 1 · 2022-10-11

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

Works for me.

> select(org.Gg.eg.db, paste0("ENSGALT", sprintf("%011d", c(22, 212, 384, 424, 519, 546, 557))), "SYMBOL","ENSEMBLTRANS")
'select()' returned 1:1 mapping between keys and columns
        ENSEMBLTRANS SYMBOL
1 ENSGALT00000000022 ARID4A
2 ENSGALT00000000212   BRD2
3 ENSGALT00000000384  A2ML4
4 ENSGALT00000000424  CSRP1
5 ENSGALT00000000519  ITGB3
6 ENSGALT00000000546  RBM15
7 ENSGALT00000000557  CHMP7

So whatever you are passing in as ESEMBLTRANS keys are not what you show in that table. Which, please don't do that. You can just do head(rownames(H9comm)) rather than providing an image of something, which may or may not reflect the object.

I would also tend to use biomaRt for this sort of mapping, as by default what you are doing is mapping your Ensembl Transcript IDs to NCBI Gene IDs, and then to HGNC IDs, and the first mapping is fraught. It's better to use Ensembl's annotation directly. However, these transcript IDs appear to be for Red Jungle Fowl rather than maternal broiler, and I can't figure out how to specify that particular chicken species

ADD COMMENT • link 3.3 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks James, but looking at my code, it woked for me before very nicely. For me these are ensemble transcript IDs and I am mapping them against a valid key type in the org.Gg.eg.db... so what could be wrong in it ?

Thanks

ADD REPLY • link 3.3 years ago Mohamed ▴ 30

0

Entering edit mode

James's and your code are in essence identical, except for the keys that are queried for.

Thus:

What is the output of head(rownames(H9comm))?
Does the code of James, that utilizes select(), work for you?
Does your code (utilizing mapIds()) work for you when using James's input? Thus: mapIds(org.Gg.eg.db, keys = paste0("ENSGALT", sprintf("%011d", c(22, 212, 384, 424, 519, 546, 557))), keytype = 'ENSEMBLTRANS', column = 'SYMBOL'). If so,the issue is specific to your input, hence point 1.
To rule out issues related to package versions, please provide your sessionInfo().

ADD REPLY • link 3.3 years ago Guido Hooiveld ★ 4.1k

0

Entering edit mode

Thanks and sorry for missunderstanding

What is the output of head(rownames(H9comm))? this produce IDs similar to ones I put in the image above so things transcript Ids like

> head(rownames(H3com))

[1] "ENSGALT00000000022" "ENSGALT00000000212" "ENSGALT00000000384" [4] "ENSGALT00000000424" "ENSGALT00000000519" "ENSGALT00000000546"

Does the code of James, that utilizes select(), work for you? No, gave me the same error:
(Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENSEMBLTRANS'. Please use the keys method to see a listing of valid arguments.)

Does your code (utilizing mapIds()) work for you when using James's input? Thus: mapIds(org.Gg.eg.db, keys = paste0("ENSGALT", sprintf("%011d", c(22, 212, 384, 424, 519, 546, 557))), keytype = 'ENSEMBLTRANS', column = 'SYMBOL'). If so,the issue is specific to your input, hence point 1. No, gave me the same error:
(Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENSEMBLTRANS'. Please use the keys method to see a listing of valid arguments.)

To rule out issues related to package versions, please provide your sessionInfo().

sessionInfo()

R version 4.2.1 Patched (2022-08-29 r82766 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.utf8 [2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8 [4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] org.Gg.eg.db_3.16.0 AnnotationDbi_1.59.1 IRanges_2.31.2
[4] S4Vectors_0.35.1 Biobase_2.57.1 BiocManager_1.30.16 [7] UniProt.ws_2.37.6 BiocGenerics_0.43.4 RSQLite_2.2.16
[10] edgeR_3.38.4 limma_3.52.2

loaded via a namespace (and not attached): [1] httr_1.4.4 bit64_4.0.5
[3] jsonlite_1.8.0 AnnotationHub_3.5.2
[5] shiny_1.7.2 assertthat_0.2.1
[7] interactiveDisplayBase_1.35.1 BiocFileCache_2.5.2
[9] blob_1.2.3 GenomeInfoDbData_1.2.9
[11] yaml_2.3.5 progress_1.2.2
[13] BiocVersion_3.16.0 pillar_1.8.1
[15] lattice_0.20-45 glue_1.6.2
[17] digest_0.6.29 promises_1.2.0.1
[19] XVector_0.37.1 colorspace_2.0-3
[21] htmltools_0.5.3 httpuv_1.6.5
[23] BiocBaseUtils_0.99.12 pkgconfig_2.0.3
[25] zlibbioc_1.43.0 xtable_1.8-4
[27] scales_1.2.1 later_1.3.0
[29] tibble_3.1.8 KEGGREST_1.37.3
[31] generics_0.1.3 ggplot2_3.3.6
[33] ellipsis_0.3.2 DT_0.25
[35] cachem_1.0.6 cli_3.3.0
[37] magrittr_2.0.3 crayon_1.5.2
[39] mime_0.12 memoise_2.0.1
[41] fansi_1.0.3 PANTHER.db_1.0.11
[43] cellxgenedp_1.1.4 prettyunits_1.1.1
[45] tools_4.2.1 hms_1.1.2
[47] lifecycle_1.0.3 munsell_0.5.0
[49] locfit_1.5-9.6 Biostrings_2.65.3
[51] compiler_4.2.1 GenomeInfoDb_1.33.8
[53] rlang_1.0.6 grid_4.2.1
[55] RCurl_1.98-1.8 rstudioapi_0.14
[57] rappdirs_0.3.3 htmlwidgets_1.5.4
[59] bitops_1.0-7 gtable_0.3.1
[61] DBI_1.1.3 curl_4.3.2
[63] R6_2.5.1 dplyr_1.0.9
[65] fastmap_1.1.0 bit_4.0.4
[67] utf8_1.2.2 filelock_1.0.2
[69] Rcpp_1.0.9 vctrs_0.4.1
[71] png_0.1-7 dbplyr_2.2.1
[73] tidyselect_1.2.0

Thanks, really appreciating any comments.

ADD REPLY • link 3.3 years ago Mohamed ▴ 30

0

Entering edit mode

It indeed looks that you are using the correct input, and since James's code doesn't run for you it points to a problem with your Bioconductor installation.

This problem is confirmed by the output of your sessionInfo(); it shows that you have a mix of Bioconductor release and development packages installed. This often doesn't work, and this may also cause your problem. (The odd y digits in the x.y.z package versions indicates development branch packages, e.g. AnnotationDbi_1.59.1. You also have installed [org.Gg.eg.db_3.**16**.0], that is also development and not release (which is [org.Gg.eg.db_3.**15**.0]). The annotation packages use the same version numbers as the Bioconductor release, and we are still on version 3.15. Version 3.16 will be released end of this month).

You should fix your mixed installation by running BiocManager::valid() and by following the instructions that come out of it.

For your information, my output.

> library(org.Gg.eg.db)
> select(org.Gg.eg.db, paste0("ENSGALT", sprintf("%011d", c(22, 212, 384, 424, 519, 546, 557))), "SYMBOL","ENSEMBLTRANS")
'select()' returned 1:1 mapping between keys and columns
        ENSEMBLTRANS SYMBOL
1 ENSGALT00000000022 ARID4A
2 ENSGALT00000000212   BRD2
3 ENSGALT00000000384  A2ML4
4 ENSGALT00000000424  CSRP1
5 ENSGALT00000000519  ITGB3
6 ENSGALT00000000546  RBM15
7 ENSGALT00000000557  CHMP7
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] org.Gg.eg.db_3.15.0  AnnotationDbi_1.58.0 IRanges_2.30.1      
[4] S4Vectors_0.34.0     Biobase_2.56.0       BiocGenerics_0.42.0 
[7] BiocManager_1.30.18 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9             XVector_0.36.0         zlibbioc_1.42.0       
 [4] bit_4.0.4              R6_2.5.1               rlang_1.0.6           
 [7] fastmap_1.1.0          blob_1.2.3             httr_1.4.4            
[10] GenomeInfoDb_1.32.4    tools_4.2.1            png_0.1-7             
[13] cli_3.4.1              DBI_1.1.3              bit64_4.0.5           
[16] crayon_1.5.2           GenomeInfoDbData_1.2.8 bitops_1.0-7          
[19] vctrs_0.4.2            KEGGREST_1.36.3        RCurl_1.98-1.9        
[22] memoise_2.0.1          cachem_1.0.6           RSQLite_2.2.18        
[25] compiler_4.2.1         Biostrings_2.64.1      pkgconfig_2.0.3       
>

ADD REPLY • link 3.3 years ago Guido Hooiveld ★ 4.1k