<N/A> Gene Ontology while using the AnnotationDbi package
2
0
Entering edit mode
CUCIO199 • 0
@cucio199-23219
Last seen 18 months ago

Hello everyone,

this is my first post and my firt time using the AnnotationDbi package along with GO.db one.

I am trying to retrieve the 'Onology' category and description for a list of GO IDs. I started importing a text file full of Gene Ontologies Ids and I processed them using the function select along with the GO.db database. The output of this command is very curios: it creates a correct Ontology and Description for the first 1,953 IDs and then it starts to place NA for the rest of them. I found this comment here (https://support.bioconductor.org/p/69790/) but I am thinking that my problem is different.

That's the code:

    yy= scan('/home/text.txt', character(), sep='\t')
    result=select(GO.db, keys=yy, columns = c("TERM",'ONTOLOGY'), keytype = "GOID")
    result=data.frame(result)
    colnames(result)=c('GO', 'TERM', 'ONTOLOGY')

Any suggestion?

Thanks in advance.

EDIT:

That's the output:


GO      TERM                    ONTOLOGY
GO:0000012  single strand break repair      BP
GO:0000016  lactase activity            MF
GO:0000026  alpha-1,2-mannosyltransferase activity  MF
GO:0000028  ribosomal small subunit assembly    BP
GO:0000062  fatty-acyl-CoA binding          MF
GO:0000076  DNA replication checkpoint      BP
GO:0000082  G1/S transition of mitotic cell cycle   BP
GO:0000086  G2/M transition of mitotic cell cycle   BP
GO:0000109  nucleotide-excision repair complex  CC



GO:0110152  NA                  NA
GO:0140326  NA                  NA
GO:0140359  NA                  NA
GO:0150099  NA                  NA
GO:0000003  NA                  NA
GO:0000009  NA                  NA
GO:0000027  NA                  NA
GO:0000032  NA                  NA
GO:0000038  NA                  NA


    > sessionInfo()
    R version 3.6.3 (2020-02-29)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 18.04.4 LTS

    Matrix products: default
    BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
    LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

    locale:
     [1] LC_CTYPE=it_IT.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8        LC_COLLATE=it_IT.UTF-8     LC_MONETARY=it_IT.UTF-8   
     [6] LC_MESSAGES=it_IT.UTF-8    LC_PAPER=it_IT.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

    attached base packages:
    [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

    other attached packages:
    [1] stringr_1.4.0        ggplot2_3.3.0        GO.db_3.10.0         AnnotationDbi_1.48.0 IRanges_2.20.2       S4Vectors_0.24.3     Biobase_2.46.0      
    [8] BiocGenerics_0.32.0 

    loaded via a namespace (and not attached):
     [1] Rcpp_1.0.4       compiler_3.6.3   pillar_1.4.3     tools_3.6.3      digest_0.6.25    bit_1.1-15.2     RSQLite_2.2.0    memoise_1.1.0    lifecycle_0.2.0 
    [10] tibble_3.0.0     gtable_0.3.0     pkgconfig_2.0.3  rlang_0.4.5      DBI_1.1.0        cli_2.0.2        rstudioapi_0.11  withr_2.1.2      vctrs_0.2.4     
    [19] bit64_0.9-7      grid_3.6.3       tidyselect_1.0.0 glue_1.3.2       R6_2.4.1         fansi_0.4.1      purrr_0.3.3      blob_1.2.1       magrittr_1.5    
    [28] scales_1.1.0     ellipsis_0.3.0   assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.6    munsell_0.5.0    crayon_1.3.4    

go annotation annotationdbi • 208 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

You don't show the results of running sessionInfo(), and you don't make this easy for anybody to test (a self-contained example would be nice). However

> gos <- scan("clipboard","c")
Read 14 items
> gos
 [1] "GO:0005854" "GO:0005856" "GO:0005858" "GO:0005859" "GO:0005861"
 [6] "GO:0005863" "GO:0005868" "GO:0005873" "GO:0004853" "GO:0004857"
[11] "GO:0004866" "GO:0004867" "GO:0004869" "GO:0005001"
## that's just the terms you show, above
> z <- select(GO.db, gos, c("TERM","ONTOLOGY"))
'select()' returned 1:1 mapping between keys and columns
## don't need to convert a data.frame into a data.frame
> z
         GOID                                                         TERM ONTOLOGY
1  GO:0005854                       nascent polypeptide-associated complex       CC
2  GO:0005856                                                 cytoskeleton       CC
3  GO:0005858                                      axonemal dynein complex       CC
4  GO:0005859                                        muscle myosin complex       CC
5  GO:0005861                                             troponin complex       CC
6  GO:0005863                        striated muscle myosin thick filament       CC
7  GO:0005868                                   cytoplasmic dynein complex       CC
8  GO:0005873                                     plus-end kinesin complex       CC
9  GO:0004853                      uroporphyrinogen decarboxylase activity       MF
10 GO:0004857                                    enzyme inhibitor activity       MF
11 GO:0004866                             endopeptidase inhibitor activity       MF
12 GO:0004867                 serine-type endopeptidase inhibitor activity       MF
13 GO:0004869               cysteine-type endopeptidase inhibitor activity       MF
14 GO:0005001 transmembrane receptor protein tyrosine phosphatase activity       MF

To make a self-contained example, you could do (Edited)

> dput(gos)
c("GO:0005854", "GO:0005856", "GO:0005858", "GO:0005859", "GO:0005861", 
"GO:0005863", "GO:0005868", "GO:0005873", "GO:0004853", "GO:0004857", 
"GO:0004866", "GO:0004867", "GO:0004869", "GO:0005001")

Which we could copy/paste

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GO.db_3.10.0         AnnotationDbi_1.48.0 IRanges_2.20.2      
[4] S4Vectors_0.24.3     Biobase_2.46.0       BiocGenerics_0.32.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3      digest_0.6.23   DBI_1.1.0       RSQLite_2.2.0  
 [5] rlang_0.4.4     blob_1.2.1      vctrs_0.2.2     tools_3.6.1    
 [9] bit64_0.9-7     bit_1.1-15.1    compiler_3.6.1  pkgconfig_2.0.3
[13] memoise_1.1.0  

If you aren't running these versions or newer, you need to upgrade.

ADD COMMENT
0
Entering edit mode

Dear James, I edited my question with what you asked. As you can see all my packages and versions are updated.

I tried with the dput function on my GO IDs (as you kindly suggested), but I still have the same problem. It's funny, but the first 1900 (about) GO IDs are processed correctly, but not the rest of them.

I also tried to split the GO IDs list in two different ones, but the ones which had <na> values in Go Term and Go Ontology unchanged. Is it possible that the database could have some limitations and my GO Ids do not match with this database? Copying my entries in QuickGO I have a 'normal' result.

ADD REPLY
0
Entering edit mode
CUCIO199 • 0
@cucio199-23219
Last seen 18 months ago

Finally I solved the issue. There was a little space at the beginning of the GO IDs which gave me the problem. Impossible to see using the head function to print the result. Thanks to whom tried to help me!

ADD COMMENT
0
Entering edit mode

Yes, but.

> select(GO.db, z, c("TERM","ONTOLOGY"))
'select()' returned 1:1 mapping between keys and columns
        GOID                                         TERM ONTOLOGY
1 GO:0110152                                         <NA>     <NA>
2 GO:0140326                                         <NA>     <NA>
3 GO:0140359                                         <NA>     <NA>
4 GO:0150099                                         <NA>     <NA>
5 GO:0000003                                 reproduction       BP
6 GO:0000009       alpha-1,6-mannosyltransferase activity       MF
7 GO:0000027             ribosomal large subunit assembly       BP
8 GO:0000032  cell wall mannoprotein biosynthetic process       BP
9 GO:0000038 very long-chain fatty acid metabolic process       BP

There still appear to be GO IDs that are found at AmiGO, but are not in the GO.db. This may be due to the source we are using to generate the GO.db, which has apparently been discontinued (they still have outdated data you can download, but they are apparently no longer updating).

ADD REPLY

Login before adding your answer.

Traffic: 395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6