Get gene symbols for a set of hg19 coordinates
1
0
Entering edit mode
pcantalupo ▴ 10
@pcantalupo-8617
Last seen 10 days ago
United States

Hello,

I have been trying all day to get the gene symbols for a large set (~7000) of hg19 coordinates. Some coordinates will not overlap a gene and some coordinates can overlap with several genes.

I've tried using biomaRt but seems I have to query each coordinate one at a time which is going to take many hours to complete (and I worry about spamming biomaRt; is that possible?). I also tried to use TxDbUCSCKnownGene but it has outdated gene_ids. Finally, org.Hs.eg.db is based on hg38 and not hg19 (I think).

I'm probably just overlooking something. How do I solve this? Is there a table of gene annotations with coordinates for hg19 that I can download? I'll just write my own script to look for overlaps if I have to.

Thank you

annotation hg19 • 166 views
ADD COMMENT
2
Entering edit mode
ATpoint ★ 4.2k
@atpoint-13662
Last seen 1 day ago
Germany

Simply download a GTF file (for example from GENCODE) matching the current genome and/or annotation version you are using in your project and get it from there.

Load the GTF into R using rtracklayer::import and then use the GenomicRanges intersection functions to intersect your ranges with the GTF (which is a GRanges object after loading). From there you can filter as needed. Yes, some genomic sites have overlapping genes (one on the + and one on the - strand). No general answer on how you want to deal with this. For code suggestions please add meaningful example data, via dput().

ADD COMMENT
0
Entering edit mode

Now why didn't I think of that? I had tunnel vision thinking I needed to use an annotation package. Thank you ATpoint!

ADD REPLY

Login before adding your answer.

Traffic: 363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6