How to correlate bp and Chr columns in the corresponding RSID column?
1
0
Entering edit mode
@49f16e03
Last seen 2.8 years ago
Brazil

Hello dear all!

I have a summary statistic that has Chr and bp columns. However, to run the LDSC script I need that same summary with the RSID column, So I need a BiomaRT script that can correlate the Chr and bp column and give me the corresponding RSID column. Nevertheless, me and my team are struggling in using BiomaRT. Is there anyone here who knows how to do that? Please contact me :)

All the best,

Iago Junger

biomaRt • 1.2k views
ADD COMMENT
0
Entering edit mode

What have you tried? Have you tried a few coordinates in BiomaRt through the web interface?

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 33 minutes ago
United States

I would tend to use one of the SNPlocs packages for this, rather than biomaRt. As a completely contrived example,

> library(SNPlocs.Hsapiens.dbSNP144.GRCh37)

## fake GRanges - you need to use your Chr and bp columns to do this!
## also note that the chromosomes have no prepended 'chr'.

> fakeo <- GRanges(rep("1", 500), IRanges(sample(1:1e5, 500), width = 1))

## EDIT

> z <- snpsByOverlaps(SNPlocs.Hsapiens.dbSNP144.GRCh37, fakeo)
> z
UnstitchedGPos object with 6 positions and 2 metadata columns:
      seqnames       pos strand |   RefSNP_id alleles_as_ambig
         <Rle> <integer>  <Rle> | <character>      <character>
  [1]        1     14728      * | rs547701710                M
  [2]        1     15150      * |  rs11803681                Y
  [3]        1     17538      * | rs200046632                M
  [4]        1     63643      * | rs202004563                R
  [5]        1     66737      * | rs560785016                K
  [6]        1     69869      * | rs548049170                W
  -------
  seqinfo: 25 sequences (1 circular) from GRCh37.p13 genome

## and now you can get the RSIDs from the GPos object.

> fo <- findOverlaps(fakeo, z)
> fo
Hits object with 6 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         3           1
  [2]        95           2
  [3]       120           6
  [4]       229           5
  [5]       370           3
  [6]       465           4
  -------
  queryLength: 500 / subjectLength: 6
> mcols(fakeo)$rsid <- NA
> mcols(fakeo)$rsid[queryHits(fo)] <- mcols(z)$RefSNP_id[subjectHits(fo)]
> fakeo
GRanges object with 500 ranges and 1 metadata column:
        seqnames    ranges strand |        rsid
           <Rle> <IRanges>  <Rle> | <character>
    [1]        1     94944      * |        <NA>
    [2]        1     97983      * |        <NA>
    [3]        1     14728      * | rs547701710
    [4]        1     56186      * |        <NA>
    [5]        1     53476      * |        <NA>
    ...      ...       ...    ... .         ...
  [496]        1     91756      * |        <NA>
  [497]        1     70297      * |        <NA>
  [498]        1     27187      * |        <NA>
  [499]        1     66576      * |        <NA>
  [500]        1     81208      * |        <NA>
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
>

Given that I just faked up some positions there isn't much overlap. But you get the general idea, I hope.

ADD COMMENT
0
Entering edit mode

I edited my post to show what the 'z' object is.

ADD REPLY

Login before adding your answer.

Traffic: 754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6