Getting rs ids using chr coordinates | Biomart
2
0
Entering edit mode
Mahan • 0
@mahan-23989
Last seen 3 months ago
Taiwan

Hello everyone,

I have: CHR START END EffectAllele RefAllele columns (hg38)

I want: CHR START END EffectAllele RefAllele RS_ID

I would like to get the SNP rs id using chromosome coordinates, please let me know how I can do this?

Thanks in Advance

SNP rs id Biomart Chromosome coordinates • 1.1k views
ADD COMMENT
2
Entering edit mode
Mike Smith ★ 5.1k
@mike-smith
Last seen 8 hours ago
EMBL Heidelberg / de.NBI

Here's an example I originally posted to BioStars. If you already have the allele information you can drop that from the attributes.

library(biomaRt)
## Use the default ENSEMBL Variation Mart & Human dataset
snpMart = useEnsembl(biomart = "snps", 
                 dataset = "hsapiens_snp")

## Create an example set of coordinates as a dataframe
SNP_M <- data.frame(CHR = c(1,1), START = c(10020, 10039), END = c(10020, 10039))

## Combine these into the format chr:start:end
## It's important to include the end even if it's a single base, 
## otherwise it searches to the end of the chromosome
coords <- apply(SNP_M, 1, paste, collapse = ":")
coords
#> [1] "1:10020:10020" "1:10039:10039"

## Submit the query
getBM(attributes = c('refsnp_id', 'chr_name', 'chrom_start', 'chrom_end', 'allele'),
      filters = c('chromosomal_region'), 
      values = coords, 
      mart = snpMart)  
#>     refsnp_id chr_name chrom_start chrom_end allele
#> 1 rs775809821        1       10020     10021   AA/A
#> 2 rs978760828        1       10039     10039    A/C
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 8 hours ago
Seattle, WA, United States

Hi,

Here is an alternative that uses a SNPlocs package:

library(SNPlocs.Hsapiens.dbSNP151.GRCh38)

my_ranges <- GRanges(c("1:1-10050", "2:1-10050"))

## The first call to snpsByOverlaps() takes a while. Subsequent calls are fast.
snpsByOverlaps(SNPlocs.Hsapiens.dbSNP151.GRCh38, my_ranges)
# UnstitchedGPos object with 3 positions and 2 metadata columns:
#       seqnames       pos strand |    RefSNP_id alleles_as_ambig
#          <Rle> <integer>  <Rle> |  <character>      <character>
#   [1]        1     10039      * |  rs978760828                M
#   [2]        1     10043      * | rs1008829651                W
#   [3]        2     10026      * | rs1366167113                R
#   -------
#   seqinfo: 25 sequences (1 circular) from GRCh38.p7 genome

Difference with the result returned by biomaRt probably due to the latter being based on a different dbSNP build.

To get the ref/alt alleles (works in BioC devel only):

library(BSgenome.Hsapiens.UCSC.hg38)
genome <- BSgenome.Hsapiens.UCSC.hg38
seqlevelsStyle(genome) <- "NCBI"

snpsByOverlaps(SNPlocs.Hsapiens.dbSNP151.GRCh38, my_ranges, genome=genome)
# UnstitchedGPos object with 3 positions and 5 metadata columns:
#       seqnames       pos strand |    RefSNP_id alleles_as_ambig genome_compat
#          <Rle> <integer>  <Rle> |  <character>      <character>     <logical>
#   [1]        1     10039      * |  rs978760828                M          TRUE
#   [2]        1     10043      * | rs1008829651                W          TRUE
#   [3]        2     10026      * | rs1366167113                R          TRUE
#        ref_allele     alt_alleles
#       <character> <CharacterList>
#   [1]           A               C
#   [2]           T               A
#   [3]           A               G
#   -------
#   seqinfo: 25 sequences (1 circular) from GRCh38.p7 genome

See ?snpsByOverlaps in the BSgenome package for more information.

Best,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 271 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6