Question: Matching GenBank Accession Number with corresponding Biomart results
Hello,

I am taking a large list of accession numbers from NCBI’s EST and GenBank databases and using biomart R package to query the the Ensembl mart and the human gene Ensembl dataset in order to retrieve the attributes entrezgenetransname, entrezgene, description, and transcript_biotype. This all works as desired except for I am unable to tell what results go with which accession numbers. Surely there must be a way to have the NCBI GenBank accession numbers queried against in a column next to its associated result. I’ve searched though the user manual and many forums and I have been unsuccessful in finding a solution. Any help would be very much appreciated! Thank you

Below I've posted the code I am running as well as a dump of my sessionInfo

library(biomaRt)
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
attributesList = listAttributes(ensembl)
attributes <- c('entrezgene_trans_name','entrezgene','description','transcript_biotype')
information <- getBM(attributes=c('entrezgene_trans_name','entrezgene','description','transcript_biotype'),
values = ResultsFile$sacc, mart = ensembl)

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.38.0         plotly_4.8.0           ggplot2_3.1.0          bindrcpp_0.2.2
[5] dplyr_0.7.8            plyr_1.8.4             stringr_1.3.1          annotate_1.60.0
[9] XML_3.98-1.16          AnnotationDbi_1.44.0   IRanges_2.16.0         S4Vectors_0.20.1
[13] Biobase_2.42.0         BiocGenerics_0.28.0    data.table_1.12.0      BiocManager_1.30.4

loaded via a namespace (and not attached):
[1] progress_1.2.0      tidyselect_0.2.5    purrr_0.2.5         colorspace_1.4-0    htmltools_0.3.6
[6] viridisLite_0.3.0   yaml_2.2.0          blob_1.1.1          rlang_0.3.1         later_0.7.5
[11] pillar_1.3.1        glue_1.3.0          withr_2.1.2         DBI_1.0.0           bit64_0.9-7
[16] bindr_0.1.1         munsell_0.5.0       gtable_0.2.0        htmlwidgets_1.3     memoise_1.1.0
[21] httpuv_1.4.5.1      crosstalk_1.0.0     curl_3.3            Rcpp_1.0.0          xtable_1.8-3
[26] promises_1.0.1      scales_1.0.0        jsonlite_1.6        mime_0.6            bit_1.1-14
[31] hms_0.4.2           digest_0.6.18       stringi_1.2.4       shiny_1.2.0         grid_3.5.1
[36] tools_3.5.1         bitops_1.0-6        magrittr_1.5        lazyeval_0.2.1      RCurl_1.95-4.11
[41] tibble_2.0.1        RSQLite_2.1.1       crayon_1.3.4        tidyr_0.8.2         pkgconfig_2.0.2
[46] prettyunits_1.0.2   assertthat_0.2.0    httr_1.4.0          rstudioapi_0.9.0    R6_2.3.0
[51] compiler_3.5.1


I have also included the structure of my ResultsFile dataTable

     qseqid     sacc length slen  pident nident qstart qend       evalue
1: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA010948    251  294  86.853    218    290  520  4.12e-66
2: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA186504    351  359  85.185    299    195  516  1.11e-86
3: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA187225    420  410  86.905    365     59  455 2.28e-123
4: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA284245    405  418  86.914    352    141  516 2.99e-112
5: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA302401    289  307  86.505    250    202  473  8.74e-78
6: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA306255    342  410  87.135    298     59  382  3.95e-96


annotation biomart R


But do note that you are trying to map from really speculative content to more accepted content, so you should expect lots of missing data.

Thank you immensely, this is a perfect solution to my issue. I am not expecting all accessions to have a hit. This option also is able to be widened to other species with the other databases available.