Question: how to find Snps that are within 1e6 from gene
gravatar for mms140130
4 months ago by
mms1401300 wrote:

I have Snp data as follows (part of the data)

    snp          chr         pos
    rs987435    chr7    78599583
    rs345783    chr15    33395779
    rs955894    chr1    189807684
    rs6088791    chr20    33907909
    rs11180435    chr12    75664046
    rs17571465    chr1    218890658
    rs2342723    chr16    5748791
    rs11992567    chr8    42951984

and gene data as follows (part of the data)

    geneName    chrom    txStart        txEnd
    A1BG       chr19    58858171    58864865
    A1BG-AS1    chr19    58863335    58866549
    A1CF       chr10    52559168    52645435
    A2M           chr12    9220303        9268825
    A2M-AS1       chr12    9217772        9220651
    A2ML1       chr12    8997599        9029377
    A2MP1       chr12    9381128        9386803
    A3GALT2        chr1    33772366    33786699
    A4GALT       chr22    43088117    43117307
    A4GNT        chr3    137842559    137851229
    AA06       chr17    31856805    31860779
    AAAS       chr12    53701239    53715412

using the pos column in snps data I would like to find the snps that are 1e6 less than the txStart 
and the snps that are 1e6 greater than the txEnd.

How can I write an R code to help me with that?

Thank you very much

ADD COMMENTlink modified 4 months ago by Vincent J. Carey, Jr.6.2k • written 4 months ago by mms1401300
gravatar for Vincent J. Carey, Jr.
4 months ago by
United States
Vincent J. Carey, Jr.6.2k wrote:

You should structure the location data you've provided as GRanges instance.

if your gene data are available in a data.frame as e.g., 

> genes
   geneName chrom   txStart     txEnd
1      A1BG chr19  58858171  58864865
2  A1BG-AS1 chr19  58863335  58866549
3      A1CF chr10  52559168  52645435
4       A2M chr12   9220303   9268825
5   A2M-AS1 chr12   9217772   9220651
6     A2ML1 chr12   8997599   9029377


ggr = GRanges(genes$chrom, IRanges(genes$txStart, genes$txEnd))

will accomplish this.  You should then define the seqinfo of your reference genome.  If you

are using hg19 and have the Homo.sapiens package,

seqinfo(ggr) = seqinfo(Homo.sapiens)[seqnames(seqinfo(ggr))]

would accomplish that.  Perform similar steps for your SNP data ... the width of the IRanges

will be 1.

You can then use flank(ggr, 1e6, both=TRUE) to define regions for exclusion based on your

gene addresses.  There is more to be done to carry out your aim but these steps and the GRanges

documentation should get you most of the way there.


ADD COMMENTlink written 4 months ago by Vincent J. Carey, Jr.6.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 287 users visited in the last hour