Search
Question: Returning gene functional annotations with gene symbols as input (using mygene Bioconductor package)
0
gravatar for Bohdan Khomtchouk
4 months ago by
Stanford University
Bohdan Khomtchouk10 wrote:

I'm working with the following R code where I'm returning gene ontology terms pertinent to the biological process (BP) category:

    > source("https://bioconductor.org/biocLite.R")
    > biocLite("mygene")
    > xli <- c('BRCA1', 'BRCA2', 'SOX2', 'MYC')
    > res <- queryMany(xli, scopes='symbol', fields=c('go'), species='rat')
    Finished
    > unlist(unlist(res$go.BP)[names(unlist(res$go.BP)) == 'term'], use.names = F)
      [1] "double-strand break repair via homologous recombination"                                                        
      [2] "double-strand break repair via homologous recombination"                                                        
      [3] "double-strand break repair via homologous recombination"                                                        
      [4] "DNA replication"                                                                                                
      [5] "DNA replication"                                                                                                
      .
      .
      .                                                           
      [378] "positive regulation of DNA biosynthetic process"                                                                
      [379] "positive regulation of response to DNA damage stimulus"                                                         
      [380] "positive regulation of response to DNA damage stimulus"                                                         
      [381] "positive regulation of ATP biosynthetic process"                                                                
      [382] "positive regulation of apoptotic signaling pathway"  

Please see: https://www.biostars.org/p/120691/.  But then when I try adding gene CARNS1 to xli:

    > xli <- c('BRCA1', 'BRCA2', 'SOX2', 'MYC', 'CARNS1')
    > res <- queryMany(xli, scopes='symbol', fields=c('go'), species='rat')
    Finished
    > unlist(unlist(res$go.BP)[names(unlist(res$go.BP)) == 'term'], use.names = F)
    Error in rbind(deparse.level, ...) : 
      numbers of columns of arguments do not match

Anyone know why this error happens?  At first, I thought it was because of the scopes argument in queryMany(), but playing around with the parameters of http://mygene.info/doc/query_service.html#available-fields did not help clarify the issue.  

Any suggestions would be appreciated.  

P.S.  I would also welcome suggestions and/or minimal working (code) examples for using other R packages to get the job done.  In fact, just to emphasize this point, if the above seems to be an intractable problem (for whatever reason), I completely do not mind using an alternative software package.

P.P.S.  My sessionInfo() is:

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Yosemite 10.10.5

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] geneXtendeR_1.0.2          data.table_1.10.4          mygene_1.8.0              
 [4] GenomicFeatures_1.24.5     AnnotationDbi_1.34.4       DEGseq_1.26.0             
 [7] samr_2.0                   matrixStats_0.52.2         impute_1.46.0             
[10] qvalue_2.4.2               PoissonSeq_1.1.2           combinat_0.0-8            
[13] BiocInstaller_1.22.3       baySeq_2.6.0               perm_1.0-0.0              
[16] abind_1.4-5                EBSeq_1.12.0               testthat_1.0.2            
[19] blockmodeling_0.1.9        DESeq2_1.12.4              SummarizedExperiment_1.2.3
[22] Biobase_2.32.0             edgeR_3.14.0               limma_3.28.21             
[25] dplyr_0.7.1                gplots_3.0.1               shinyapps_0.4.1.8         
[28] shiny_1.0.3                bindrcpp_0.2               rtracklayer_1.32.2        
[31] GenomicRanges_1.24.3       GenomeInfoDb_1.8.7         IRanges_2.6.1             
[34] S4Vectors_0.10.3           BiocGenerics_0.18.0        devtools_1.13.2           

loaded via a namespace (and not attached):
 [1] backports_1.1.0         Hmisc_4.0-3             plyr_1.8.4              lazyeval_0.2.0         
 [5] BiocParallel_1.6.6      ggplot2_2.2.1           digest_0.6.12           htmltools_0.3.6        
 [9] gdata_2.18.0            magrittr_1.5            checkmate_1.8.3         memoise_1.1.0          
[13] cluster_2.0.6           Biostrings_2.40.2       annotate_1.50.1         colorspace_1.3-2       
[17] blob_1.1.0              crayon_1.3.2            RCurl_1.95-4.8          jsonlite_1.5           
[21] roxygen2_6.0.1          genefilter_1.54.2       bindr_0.1               survival_2.41-3        
[25] glue_1.1.1              gtable_0.2.0            zlibbioc_1.18.0         XVector_0.12.1         
[29] scales_0.4.1            DBI_0.7                 Rcpp_0.12.11            xtable_1.8-2           
[33] htmlTable_1.9           foreign_0.8-69          bit_1.1-12              Formula_1.2-2          
[37] sqldf_0.4-11            htmlwidgets_0.9         httr_1.2.1              RColorBrewer_1.1-2     
[41] acepack_1.4.1           pkgconfig_2.0.1         XML_3.98-1.9            nnet_7.3-12            
[45] locfit_1.5-9.1          rlang_0.1.1             reshape2_1.4.2          munsell_0.4.3          
[49] tools_3.3.3             gsubfn_0.6-6            RSQLite_2.0             stringr_1.2.0          
[53] knitr_1.16              bit64_0.9-7             caTools_1.17.1          mime_0.5               
[57] xml2_1.1.1              biomaRt_2.28.0          curl_2.7                tibble_1.3.3           
[61] geneplotter_1.50.0      stringi_1.1.5           desc_1.1.0              lattice_0.20-35        
[65] Matrix_1.2-10           commonmark_1.2          bitops_1.0-6            httpuv_1.3.5           
[69] R6_2.2.2                latticeExtra_0.6-28     KernSmooth_2.23-15      gridExtra_2.2.1        
[73] gtools_3.5.0            assertthat_0.2.0.9000   chron_2.3-50            proto_1.0.0            
[77] rprojroot_1.2           withr_1.0.2             GenomicAlignments_1.8.4 Rsamtools_1.24.0       
[81] grid_3.3.3              rpart_4.1-11            git2r_0.18.0            base64enc_0.1-3 
ADD COMMENTlink modified 4 months ago by James W. MacDonald45k • written 4 months ago by Bohdan Khomtchouk10
1
gravatar for James W. MacDonald
4 months ago by
United States
James W. MacDonald45k wrote:

I don't think you need something specific for this. It's just a couple of queries.

> library(org.Rn,eg.db)
> library(GO.db)
## note that all caps implies human. You want just the first letter capitalized
> gns <- c('BRCA1', 'BRCA2', 'SOX2', 'MYC', 'CARNS1')
> gns2 <- paste0(substr(gns, 1, 1), tolower(substr(gns, 2, nchar(gns))))
> gos <- select(org.Hs.eg.db, gns2, "GO","SYMBOL")
'select()' returned 1:many mapping between keys and columns
> terms <- select(GO.db, as.character(gos[,2]), "TERM","GOID")
'select()' returned many:1 mapping between keys and columns
> head(terms)
        GOID                                                    TERM
1 GO:0000151                                ubiquitin ligase complex
2 GO:0000724 double-strand break repair via homologous recombination
3 GO:0000729                      DNA double-strand break processing
4 GO:0000731                    DNA synthesis involved in DNA repair
5 GO:0000732                                     strand displacement
6 GO:0000800                                         lateral element
ADD COMMENTlink written 4 months ago by James W. MacDonald45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 124 users visited in the last hour