How to correlate bp and Chr columns in the corresponding RSID column?
1
0
Entering edit mode
@49f16e03
Last seen 10 months ago
Brazil

Hello dear all!

I have a summary statistic that has Chr and bp columns. However, to run the LDSC script I need that same summary with the RSID column, So I need a BiomaRT script that can correlate the Chr and bp column and give me the corresponding RSID column. Nevertheless, me and my team are struggling in using BiomaRT. Is there anyone here who knows how to do that? Please contact me :)

All the best,

Iago Junger

biomaRt • 419 views
0
Entering edit mode

What have you tried? Have you tried a few coordinates in BiomaRt through the web interface?

0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

I would tend to use one of the SNPlocs packages for this, rather than biomaRt. As a completely contrived example,

> library(SNPlocs.Hsapiens.dbSNP144.GRCh37)

## fake GRanges - you need to use your Chr and bp columns to do this!
## also note that the chromosomes have no prepended 'chr'.

> fakeo <- GRanges(rep("1", 500), IRanges(sample(1:1e5, 500), width = 1))

## EDIT

> z <- snpsByOverlaps(SNPlocs.Hsapiens.dbSNP144.GRCh37, fakeo)
> z
UnstitchedGPos object with 6 positions and 2 metadata columns:
seqnames       pos strand |   RefSNP_id alleles_as_ambig
<Rle> <integer>  <Rle> | <character>      <character>
[1]        1     14728      * | rs547701710                M
[2]        1     15150      * |  rs11803681                Y
[3]        1     17538      * | rs200046632                M
[4]        1     63643      * | rs202004563                R
[5]        1     66737      * | rs560785016                K
[6]        1     69869      * | rs548049170                W
-------
seqinfo: 25 sequences (1 circular) from GRCh37.p13 genome

## and now you can get the RSIDs from the GPos object.

> fo <- findOverlaps(fakeo, z)
> fo
Hits object with 6 hits and 0 metadata columns:
queryHits subjectHits
<integer>   <integer>
[1]         3           1
[2]        95           2
[3]       120           6
[4]       229           5
[5]       370           3
[6]       465           4
-------
queryLength: 500 / subjectLength: 6
> mcols(fakeo)$rsid <- NA > mcols(fakeo)$rsid[queryHits(fo)] <- mcols(z)\$RefSNP_id[subjectHits(fo)]
> fakeo
GRanges object with 500 ranges and 1 metadata column:
seqnames    ranges strand |        rsid
<Rle> <IRanges>  <Rle> | <character>
[1]        1     94944      * |        <NA>
[2]        1     97983      * |        <NA>
[3]        1     14728      * | rs547701710
[4]        1     56186      * |        <NA>
[5]        1     53476      * |        <NA>
...      ...       ...    ... .         ...
[496]        1     91756      * |        <NA>
[497]        1     70297      * |        <NA>
[498]        1     27187      * |        <NA>
[499]        1     66576      * |        <NA>
[500]        1     81208      * |        <NA>
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
>


Given that I just faked up some positions there isn't much overlap. But you get the general idea, I hope.

0
Entering edit mode

I edited my post to show what the 'z' object is.