getBM to retrieve ensembl_gene_id's within a certain distance
1
0
Entering edit mode
@eemelileppaaho-12628
Last seen 7.1 years ago

I've been using the getBM-function to retrieve genes given a SNP (rs-id). However, if I try searching from a certain base pair distance, I do not get the same results as with the more simple query. E.g., in:

library(biomaRt)
mart.hs <- useMart("ensembl", "hsapiens_gene_ensembl")
mart.snp <- useMart("ENSEMBL_MART_SNP", "hsapiens_snp")

snip <- "rs6440082"

snipInfo <- getBM(attributes = c("refsnp_id","ensembl_gene_stable_id","chr_name","chrom_start","chrom_end","chrom_strand"),
                  filters="snp_filter", values=snip, mart=mart.snp)

bp <- as.numeric(snipInfo$chrom_start)
region <- paste(snipInfo$chr_name, bp-5e5, bp+5e5, snipInfo$chrom_strand, sep = ":")
genes <- unlist(getBM(attributes = "ensembl_gene_id", filters = "chromosomal_region", values = region, mart = mart.hs))

the latter getBM-query retrieves 8 genes, none of which contain the one retrieved by the former query. What is it that I am doing wrong here?

biomart • 851 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States
> getBM(c("start_position","end_position","strand"), "ensembl_gene_id", "ENSG00000114127", mart.hs)
  start_position end_position strand
1      142306607    142448062     -1

> region2 <- "3:141895564:142895564:-1"
> getBM(attributes = "ensembl_gene_id", filters = "chromosomal_region", values = region2, mart = mart.hs)
   ensembl_gene_id
1  ENSG00000202125
2  ENSG00000200389
3  ENSG00000244124
4  ENSG00000252745
5  ENSG00000239641
6  ENSG00000240950
7  ENSG00000251804
8  ENSG00000244327
9  ENSG00000175054
10 ENSG00000114127
11 ENSG00000163710
12 ENSG00000114126
13 ENSG00000233597
14 ENSG00000279147
15 ENSG00000175066
ADD COMMENT
0
Entering edit mode

I should also note that you are assuming that a SNP is actually strand based. It's not. The reference and alternate alleles are different, depending on whichever strand you are talking about, but both strands have either the reference or alternate allele. Genes, on the other hand, are strand based.

ADD REPLY
0
Entering edit mode

And further to that point:

> region2 <- "3:141895564:142895564"
> getBM(attributes = "ensembl_gene_id", filters = "chromosomal_region", values = region2, mart = mart.hs)
   ensembl_gene_id
1  ENSG00000202125
2  ENSG00000200389
3  ENSG00000244124
4  ENSG00000252745
5  ENSG00000120756
6  ENSG00000239641
7  ENSG00000240950
8  ENSG00000144935
9  ENSG00000206604
10 ENSG00000251787
11 ENSG00000251804
12 ENSG00000199319
13 ENSG00000244327
14 ENSG00000175054
15 ENSG00000114127
16 ENSG00000163710
17 ENSG00000114126
18 ENSG00000242390
19 ENSG00000242479
20 ENSG00000233597
21 ENSG00000279147
22 ENSG00000069849
23 ENSG00000175066
ADD REPLY

Login before adding your answer.

Traffic: 776 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6