biomaRt not retrieving complete list of functional consequences associated with list of SNPs
1
0
Entering edit mode
Miguel • 0
@e6964e3c
Last seen 9 months ago
France

Hello,

I've been trying to use biomaRt to query ensembl's Biomart and extract genomic and functional information about a list of SNPs. Here's an example of the kind of query I'm doing and the results.

snpMart <- useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")
snpList <- c("rs7349186", "rs3927683",  "rs1697421",    "rs12041233", "rs112751018", "rs2819336")
snp_annot <- getBM(attributes = c('refsnp_id', "consequence_type_tv", 'chr_name', 'chrom_start', 'chrom_end'),
                    filters = "snp_filter",
                    values = snpList,
                    mart = snpMart) %>%
  arrange(chr_name, chrom_start) %>%
  relocate(consequence_type_tv, .before = chr_name) 

snp_annot
    refsnp_id consequence_type_tv chr_name chrom_start chrom_end
1   rs7349186    missense_variant        1    20644627  20644627
2   rs3927683                            1    20796024  20796024
3   rs1697421                            1    21496799  21496799
4  rs12041233                            1    37287106  37287106
5 rs112751018                            1    39622232  39622232
6   rs2819336      intron_variant        1    43550138  43550138

The issue I'm having, is related to the consequence_type_tv, on the attributes to extract from the query. In fact, many SNPs return no consequence_type_tv. I've assumed these would be because these were intergenic variants, which is the case for many variants. So, my first issue is that these variants are not labelled as "intergenic variants".

However, when checking a few variants manually in the ensembl site, I've noticed that many that are not annotated by biomaRt, are actually annotated as other consequences, besides intergenic. Take for example rs12041233: in my biomaRt query, there's no annotated functional consequence. In contrast, in the ensembl website, the variant is reported as an intronic variant (for an ensembl lncRNA).

Not only that, I found also some incoherences between what's on the variant page in ensembl, and the information retrieved using biomaRt. Namely, on allelic frequencies reported for minor alleles being different between the two sources (which I thought were the same.)

Am I doing something wrong? Am I missing something?

Thank you very much for your help!

sessioninfo()

R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-3 pheatmap_1.0.12    qqman_0.1.8        gprofiler2_0.2.2   UpSetR_1.4.0       eulerr_7.0.0      
 [7] DT_0.28            gwasrapidd_0.99.15 rsnps_0.5.0.0      biomaRt_2.54.1     lubridate_1.9.2    forcats_1.0.0     
[13] stringr_1.5.0      dplyr_1.1.2        purrr_1.0.1        readr_2.1.4        tidyr_1.3.0        tibble_3.2.1      
[19] ggplot2_3.4.2      tidyverse_2.0.0    data.table_1.14.8 

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           bit64_4.0.5            filelock_1.0.2         progress_1.2.2         httr_1.4.6            
 [6] GenomeInfoDb_1.34.9    tools_4.2.2            bslib_0.5.0            utf8_1.2.3             R6_2.5.1              
[11] lazyeval_0.2.2         DBI_1.1.3              BiocGenerics_0.44.0    colorspace_2.1-0       withr_2.5.0           
[16] tidyselect_1.2.0       gridExtra_2.3          prettyunits_1.1.1      bit_4.0.5              curl_5.0.1            
[21] compiler_4.2.2         cli_3.6.1              Biobase_2.58.0         xml2_1.3.5             plotly_4.10.2         
[26] labeling_0.4.2         triebeard_0.4.1        sass_0.4.7             scales_1.2.1           rappdirs_0.3.3        
[31] digest_0.6.33          rmarkdown_2.23         XVector_0.38.0         pkgconfig_2.0.3        htmltools_0.5.5       
[36] dbplyr_2.3.3           fastmap_1.1.1          htmlwidgets_1.6.2      rlang_1.1.1            rstudioapi_0.15.0     
[41] httpcode_0.3.0         RSQLite_2.3.1          shiny_1.7.4.1          farver_2.1.1           jquerylib_0.1.4       
[46] generics_0.1.3         jsonlite_1.8.7         crosstalk_1.2.0        RCurl_1.98-1.12        magrittr_2.0.3        
[51] GenomeInfoDbData_1.2.9 Rcpp_1.0.11            munsell_0.5.0          S4Vectors_0.36.2       fansi_1.0.4           
[56] lifecycle_1.0.3        stringi_1.7.12         yaml_2.3.7             MASS_7.3-60            zlibbioc_1.44.0       
[61] plyr_1.8.8             BiocFileCache_2.6.1    grid_4.2.2             blob_1.2.4             promises_1.2.0.1      
[66] ggrepel_0.9.3          crayon_1.5.2           Biostrings_2.66.0      hms_1.1.3              KEGGREST_1.38.0       
[71] polylabelr_0.2.0       knitr_1.43             pillar_1.9.0           stats4_4.2.2           crul_1.4.0            
[76] XML_3.99-0.14          glue_1.6.2             evaluate_0.21          calibrate_1.7.7        httpuv_1.6.11         
[81] urltools_1.7.3         png_0.1-8              vctrs_0.6.3            tzdb_0.4.0             polyclip_1.10-4       
[86] gtable_0.3.3           assertthat_0.2.1       cachem_1.0.8           xfun_0.39              mime_0.12             
[91] xtable_1.8-4           later_1.3.1            viridisLite_0.4.2      AnnotationDbi_1.60.2   memoise_2.0.1         
[96] IRanges_2.32.0         timechange_0.2.0       ellipsis_0.3.2
ensembl R biomaRt • 594 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 19 hours ago
United States

The biomaRt package doesn't query Ensembl - it queries the Biomart server at Ensembl. If you do your query by hand using the Biomart server, you get the same results. There might be an argument that what you get from Biomart should 100% match what you get from a directed search of the Ensembl site, but that's an issue for the folks at Ensembl.

0
Entering edit mode

Thanks, James. I think you are spot on. I do realise that the BioMart server is an entity within Ensembl, and that the queries are being passed on to this server directly. My issue is that I would assume that the info on the BioMart server and Ensembl is identical. Which, with the current query I'm using, is not the case (I've tried lots of different arguments to try to get the information, with no luck). This is the reason why I was posting this question here: to know if any users ever came across this and know whether my query is missing some important arguments/attributes/filters.

Regarding the folks @ Ensembl... I did contact them multiple times, but had no answer! :(

ADD REPLY

Login before adding your answer.

Traffic: 592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6