I'm having trouble finding GO terms and definitions for a list of genes using biomaRt. The problem seems to be specific to some genes instead of all. For exmaple,
> library("tibble") > library("biomaRt") > BM = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
Below is a successful query.
> tibble(getBM(attributes = c("external_gene_name", "definition_1006"), filters = "external_gene_name", values = "RUNX1", mart = BM)) # A tibble: 39 x 1 `getBM(...)`$external_ge… $definition_1006 <chr> <chr> 1 RUNX1 Any molecular function by which a gene product int… 2 RUNX1 Any process that modulates the frequency, rate or … 3 RUNX1 A membrane-bounded organelle of eukaryotic cells i… 4 RUNX1 Interacting selectively and non-covalently with AT… 5 RUNX1 A protein or a member of a complex that interacts … 6 RUNX1 Any process that activates or increases the freque… 7 RUNX1 Organized structure of distinctive morphology and … 8 RUNX1 The part of the cytoplasm that does not contain or… 9 RUNX1 That part of the nuclear content other than the ch… 10 RUNX1 Interacting selectively and non-covalently with an… # … with 29 more rows
However, when I change RUNX1 to BCOR, things start to fail,
> tibble(getBM(attributes = c("external_gene_name", "definition_1006"), filters = "external_gene_name", values = "BCOR", mart = BM)) Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 22 did not have 2 elements
But I can confirm BCOR is a valid gene symbol,
> tibble(getBM(attributes = c("external_gene_name", "ensembl_gene_id", "go_id"), filters = "external_gene_name", values = "BCOR", mart = BM)) # A tibble: 24 x 1 `getBM(...)`$external_gene_name $ensembl_gene_id $go_id <chr> <chr> <chr> 1 BCOR ENSG00000183337 GO:0005515 2 BCOR ENSG00000183337 GO:0005634 3 BCOR ENSG00000183337 GO:0006325 4 BCOR ENSG00000183337 GO:0004842 5 BCOR ENSG00000183337 GO:0000122 6 BCOR ENSG00000183337 GO:0003714 7 BCOR ENSG00000183337 GO:0007507 8 BCOR ENSG00000183337 GO:0008134 9 BCOR ENSG00000183337 GO:0044212 10 BCOR ENSG00000183337 GO:0045892 # … with 14 more rows
It appears that definition_1006 just does not work for BCOR. This baffles me. Does anybody know what went wrong here? Thanks.
> sessionInfo() R version 3.5.2 (2018-12-20) Platform: x86_64-apple-darwin16.7.0 (64-bit) Running under: macOS Mojave 10.14.6 Matrix products: default BLAS: /usr/local/R/3.5.2/lib/R/lib/libRblas.dylib LAPACK: /usr/local/R/3.5.2/lib/R/lib/libRlapack.dylib locale:  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  tibble_2.1.1 biomaRt_2.38.0 loaded via a namespace (and not attached):  Rcpp_1.0.1 AnnotationDbi_1.44.0 magrittr_1.5  BiocGenerics_0.28.0 hms_0.4.2 progress_1.2.0  IRanges_2.16.0 bit_1.1-14 R6_2.4.0  rlang_0.3.4 fansi_0.4.0 httr_1.4.0  stringr_1.4.0 blob_1.1.1 tools_3.5.2  parallel_3.5.2 Biobase_2.42.0 utf8_1.1.4  cli_1.1.0 DBI_1.0.0 bit64_0.9-7  digest_0.6.18 assertthat_0.2.1 crayon_1.3.4  S4Vectors_0.20.1 bitops_1.0-6 curl_3.3  RCurl_1.95-4.12 memoise_1.1.0 RSQLite_2.1.1  stringi_1.4.3 pillar_1.3.1 compiler_3.5.2  prettyunits_1.0.2 stats4_3.5.2 XML_3.98-1.19  pkgconfig_2.0.2