Returning gene functional annotations with gene symbols as input (using mygene Bioconductor package)
1
0
Entering edit mode
@bohdan-khomtchouk-9617
Last seen 7.2 years ago
Stanford University

I'm working with the following R code where I'm returning gene ontology terms pertinent to the biological process (BP) category:

    > source("https://bioconductor.org/biocLite.R")
    > biocLite("mygene")
    > xli <- c('BRCA1', 'BRCA2', 'SOX2', 'MYC')
    > res <- queryMany(xli, scopes='symbol', fields=c('go'), species='rat')
    Finished
    > unlist(unlist(res$go.BP)[names(unlist(res$go.BP)) == 'term'], use.names = F)
      [1] "double-strand break repair via homologous recombination"                                                        
      [2] "double-strand break repair via homologous recombination"                                                        
      [3] "double-strand break repair via homologous recombination"                                                        
      [4] "DNA replication"                                                                                                
      [5] "DNA replication"                                                                                                
      .
      .
      .                                                           
      [378] "positive regulation of DNA biosynthetic process"                                                                
      [379] "positive regulation of response to DNA damage stimulus"                                                         
      [380] "positive regulation of response to DNA damage stimulus"                                                         
      [381] "positive regulation of ATP biosynthetic process"                                                                
      [382] "positive regulation of apoptotic signaling pathway"  

Please see: https://www.biostars.org/p/120691/.  But then when I try adding gene CARNS1 to xli:

    > xli <- c('BRCA1', 'BRCA2', 'SOX2', 'MYC', 'CARNS1')
    > res <- queryMany(xli, scopes='symbol', fields=c('go'), species='rat')
    Finished
    > unlist(unlist(res$go.BP)[names(unlist(res$go.BP)) == 'term'], use.names = F)
    Error in rbind(deparse.level, ...) : 
      numbers of columns of arguments do not match

Anyone know why this error happens?  At first, I thought it was because of the scopes argument in queryMany(), but playing around with the parameters of http://mygene.info/doc/query_service.html#available-fields did not help clarify the issue.  

Any suggestions would be appreciated.  

P.S.  I would also welcome suggestions and/or minimal working (code) examples for using other R packages to get the job done.  In fact, just to emphasize this point, if the above seems to be an intractable problem (for whatever reason), I completely do not mind using an alternative software package.

P.P.S.  My sessionInfo() is:

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Yosemite 10.10.5

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] geneXtendeR_1.0.2          data.table_1.10.4          mygene_1.8.0              
 [4] GenomicFeatures_1.24.5     AnnotationDbi_1.34.4       DEGseq_1.26.0             
 [7] samr_2.0                   matrixStats_0.52.2         impute_1.46.0             
[10] qvalue_2.4.2               PoissonSeq_1.1.2           combinat_0.0-8            
[13] BiocInstaller_1.22.3       baySeq_2.6.0               perm_1.0-0.0              
[16] abind_1.4-5                EBSeq_1.12.0               testthat_1.0.2            
[19] blockmodeling_0.1.9        DESeq2_1.12.4              SummarizedExperiment_1.2.3
[22] Biobase_2.32.0             edgeR_3.14.0               limma_3.28.21             
[25] dplyr_0.7.1                gplots_3.0.1               shinyapps_0.4.1.8         
[28] shiny_1.0.3                bindrcpp_0.2               rtracklayer_1.32.2        
[31] GenomicRanges_1.24.3       GenomeInfoDb_1.8.7         IRanges_2.6.1             
[34] S4Vectors_0.10.3           BiocGenerics_0.18.0        devtools_1.13.2           

loaded via a namespace (and not attached):
 [1] backports_1.1.0         Hmisc_4.0-3             plyr_1.8.4              lazyeval_0.2.0         
 [5] BiocParallel_1.6.6      ggplot2_2.2.1           digest_0.6.12           htmltools_0.3.6        
 [9] gdata_2.18.0            magrittr_1.5            checkmate_1.8.3         memoise_1.1.0          
[13] cluster_2.0.6           Biostrings_2.40.2       annotate_1.50.1         colorspace_1.3-2       
[17] blob_1.1.0              crayon_1.3.2            RCurl_1.95-4.8          jsonlite_1.5           
[21] roxygen2_6.0.1          genefilter_1.54.2       bindr_0.1               survival_2.41-3        
[25] glue_1.1.1              gtable_0.2.0            zlibbioc_1.18.0         XVector_0.12.1         
[29] scales_0.4.1            DBI_0.7                 Rcpp_0.12.11            xtable_1.8-2           
[33] htmlTable_1.9           foreign_0.8-69          bit_1.1-12              Formula_1.2-2          
[37] sqldf_0.4-11            htmlwidgets_0.9         httr_1.2.1              RColorBrewer_1.1-2     
[41] acepack_1.4.1           pkgconfig_2.0.1         XML_3.98-1.9            nnet_7.3-12            
[45] locfit_1.5-9.1          rlang_0.1.1             reshape2_1.4.2          munsell_0.4.3          
[49] tools_3.3.3             gsubfn_0.6-6            RSQLite_2.0             stringr_1.2.0          
[53] knitr_1.16              bit64_0.9-7             caTools_1.17.1          mime_0.5               
[57] xml2_1.1.1              biomaRt_2.28.0          curl_2.7                tibble_1.3.3           
[61] geneplotter_1.50.0      stringi_1.1.5           desc_1.1.0              lattice_0.20-35        
[65] Matrix_1.2-10           commonmark_1.2          bitops_1.0-6            httpuv_1.3.5           
[69] R6_2.2.2                latticeExtra_0.6-28     KernSmooth_2.23-15      gridExtra_2.2.1        
[73] gtools_3.5.0            assertthat_0.2.0.9000   chron_2.3-50            proto_1.0.0            
[77] rprojroot_1.2           withr_1.0.2             GenomicAlignments_1.8.4 Rsamtools_1.24.0       
[81] grid_3.3.3              rpart_4.1-11            git2r_0.18.0            base64enc_0.1-3 
software error mygene gene ontology R • 1.3k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

I don't think you need something specific for this. It's just a couple of queries.

> library(org.Rn,eg.db)
> library(GO.db)
## note that all caps implies human. You want just the first letter capitalized
> gns <- c('BRCA1', 'BRCA2', 'SOX2', 'MYC', 'CARNS1')
> gns2 <- paste0(substr(gns, 1, 1), tolower(substr(gns, 2, nchar(gns))))
> gos <- select(org.Hs.eg.db, gns2, "GO","SYMBOL")
'select()' returned 1:many mapping between keys and columns
> terms <- select(GO.db, as.character(gos[,2]), "TERM","GOID")
'select()' returned many:1 mapping between keys and columns
> head(terms)
        GOID                                                    TERM
1 GO:0000151                                ubiquitin ligase complex
2 GO:0000724 double-strand break repair via homologous recombination
3 GO:0000729                      DNA double-strand break processing
4 GO:0000731                    DNA synthesis involved in DNA repair
5 GO:0000732                                     strand displacement
6 GO:0000800                                         lateral element
ADD COMMENT

Login before adding your answer.

Traffic: 440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6