Hello,
I am taking a large list of accession numbers from NCBI’s EST and GenBank databases and using biomart R package to query the the Ensembl mart and the human gene Ensembl dataset in order to retrieve the attributes entrezgenetransname, entrezgene, description, and transcript_biotype. This all works as desired except for I am unable to tell what results go with which accession numbers. Surely there must be a way to have the NCBI GenBank accession numbers queried against in a column next to its associated result. I’ve searched though the user manual and many forums and I have been unsuccessful in finding a solution. Any help would be very much appreciated! Thank you
Below I've posted the code I am running as well as a dump of my sessionInfo
library(biomaRt)
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
attributesList = listAttributes(ensembl)
attributes <- c('entrezgene_trans_name','entrezgene','description','transcript_biotype')
information <- getBM(attributes=c('entrezgene_trans_name','entrezgene','description','transcript_biotype'),
values = ResultsFile$sacc,
mart = ensembl)
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.38.0 plotly_4.8.0 ggplot2_3.1.0 bindrcpp_0.2.2
[5] dplyr_0.7.8 plyr_1.8.4 stringr_1.3.1 annotate_1.60.0
[9] XML_3.98-1.16 AnnotationDbi_1.44.0 IRanges_2.16.0 S4Vectors_0.20.1
[13] Biobase_2.42.0 BiocGenerics_0.28.0 data.table_1.12.0 BiocManager_1.30.4
loaded via a namespace (and not attached):
[1] progress_1.2.0 tidyselect_0.2.5 purrr_0.2.5 colorspace_1.4-0 htmltools_0.3.6
[6] viridisLite_0.3.0 yaml_2.2.0 blob_1.1.1 rlang_0.3.1 later_0.7.5
[11] pillar_1.3.1 glue_1.3.0 withr_2.1.2 DBI_1.0.0 bit64_0.9-7
[16] bindr_0.1.1 munsell_0.5.0 gtable_0.2.0 htmlwidgets_1.3 memoise_1.1.0
[21] httpuv_1.4.5.1 crosstalk_1.0.0 curl_3.3 Rcpp_1.0.0 xtable_1.8-3
[26] promises_1.0.1 scales_1.0.0 jsonlite_1.6 mime_0.6 bit_1.1-14
[31] hms_0.4.2 digest_0.6.18 stringi_1.2.4 shiny_1.2.0 grid_3.5.1
[36] tools_3.5.1 bitops_1.0-6 magrittr_1.5 lazyeval_0.2.1 RCurl_1.95-4.11
[41] tibble_2.0.1 RSQLite_2.1.1 crayon_1.3.4 tidyr_0.8.2 pkgconfig_2.0.2
[46] prettyunits_1.0.2 assertthat_0.2.0 httr_1.4.0 rstudioapi_0.9.0 R6_2.3.0
[51] compiler_3.5.1
I have also included the structure of my ResultsFile dataTable
qseqid sacc length slen pident nident qstart qend evalue
1: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA010948 251 294 86.853 218 290 520 4.12e-66
2: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA186504 351 359 85.185 299 195 516 1.11e-86
3: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA187225 420 410 86.905 365 59 455 2.28e-123
4: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA284245 405 418 86.914 352 141 516 2.99e-112
5: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA302401 289 307 86.505 250 202 473 8.74e-78
6: 022f7e4d-c2b0-445d-92be-1d9c751edb51 AA306255 342 410 87.135 298 59 382 3.95e-96

