Annotating outdated SNP identifiers
0
0
Entering edit mode
sandmann.t ▴ 70
@sandmannt-11014
Last seen 7 months ago
United States

Dear Bioconductors,

I have a long list of dbSNP identifiers (e.g. rs7335199), some of which are outdated and now represented by a new identifier.

For example, the following call to ensembl's REST API reveals that variant rs7335199 is now referred to as rs3.

curl 'https://rest.ensembl.org/variant_recoder/human/rs7335199?fields=id' -H 'Content-type:application/json'

[{"input":"rs7335199","id":["rs3"]}]

My goal is to retrieve the genome coordinates (GRCh38) and, if available, the current variant identifier for millions of variants. So far, I have tried

  1. ensembl's REST service: great for smaller queries, but not for millions of variants
  2. the biomaRt Bioconductor package: works great, but takes a long time to query
  3. the SNPlocs.Hsapiens.dbSNP150.GRCh38 Bioconductor package: contains the coordinates for up-to-date variants (e.g. rs3) but not outdated ones (e.g. rs7335199). 
library(SNPlocs.Hsapiens.dbSNP150.GRCh38)
snps <- SNPlocs.Hsapiens.dbSNP150.GRCh38

snpsById(snps, c("rs3", "rs7335199"), ifnotfound = "drop")
GPos object with 1 position and 2 metadata columns:
      seqnames       pos strand |   RefSNP_id alleles_as_ambig
         <Rle> <integer>  <Rle> | <character>      <character>
  [1]       13  31872705      * |         rs3                Y
  -------
  seqinfo: 25 sequences (1 circular) from GRCh38.p7 genome

What is the recommended way to obtain

  • the current dbSNP identifier and
  • their genomic coordinates for a mixed list of current and deprecated variant ids?

Any pointers would be great!

Many thanks,

Thomas

variants variantannotation SNPlocs.Hsapiens.dbSNP150.GRCh38 biomaRt • 1.1k views
ADD COMMENT
0
Entering edit mode

Time taken for biomaRt queries tends to scale exponentially with the number of values you're asking for. Perhaps you could use biomaRt only for the ID conversion, and then SNPlocs.Hsapiens.dbSNP150.GRCh38 for the coordinates?

I guess if you have 1 million SNPs biomaRt will batch that into 2000 separate queries - even at 1 second per query that's over 30 mins, I can see why you want something quicker if this is more than a one-off thing.

ADD REPLY

Login before adding your answer.

Traffic: 908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6