revisiting genomic coordinates to gene
1
0
Entering edit mode
Andrew Yee ▴ 350
@andrew-yee-2667
Last seen 10.2 years ago
I'm interested in converting genomic coordinates to gene names, with potential use of the org.Hs.eg.db library, e.g. converting chr3:41,266,083 to CTNNB1. I know that this topic has been addressed before, see e.g.: https://stat.ethz.ch/pipermail/bioconductor/2009-January/025906.html (discusses use of overlap in IRanges) https://stat.ethz.ch/pipermail/bioconductor/2009-October/030140.html I was wondering if there have been any new solutions or new packages that address this problem since these threads. Thanks, Andrew [[alternative HTML version deleted]]
• 1.0k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 10 weeks ago
United States
There are many possible approaches and possible pitfalls. Surely the following is relevant: > get("CTNNB1", revmap(org.Hs.egSYMBOL)) [1] "1499" > get("1499", org.Hs.egCHRLOC) 3 41240941 > get("1499", org.Hs.egCHRLOCEND) 3 41281939 Your location lies within these limits. You could do this more systematically by defining a collection of Entrez Gene IDs and building an IRanges or GRanges instance that stores all the "gene boundary" information for these IDs. You will have to attend to signs and multiplicities, and to build versions. The GenomicFeatures makeTranscriptDb* facilities are potentially useful when one is interested in transcribed or exonic regions specifically. In the following, tx.3 is an extract from the result of makeTranscriptDbFromUCSC("hg18"): > get("1499", org.Hs.egUCSCKG) [1] "uc003ckp.2" "uc003ckq.2" "uc003ckr.2" "uc003cks.2" "uc003ckt.1" [6] "uc010hia.1" "uc011azf.1" "uc011azg.1" > tx.3[ elementMetadata(tx.3)$tx_name %in% .Last.value, ] GRanges with 6 ranges and 2 elementMetadata values seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr3 [41211405, 41255849] + | 11545 uc010hia.1 [2] chr3 [41215946, 41256943] + | 11546 uc003ckp.2 [3] chr3 [41215946, 41256943] + | 11547 uc003ckq.2 [4] chr3 [41215946, 41256943] + | 11548 uc003ckr.2 [5] chr3 [41249904, 41253941] + | 11550 uc003cks.2 [6] chr3 [41252167, 41253962] + | 11551 uc003ckt.1 seqlengths chr1 chr1_random chr10 ... chrX_random chrY 247249719 1663265 135374737 ... 1719168 57772954 and there are undoubtedly ways to use biomaRt to address your concern. Perhaps the following is also of interest: > findOverlaps(IRanges(start=41266083,width=1), ranges(tx.3)) An object of class "RangesMatching" Slot "matchMatrix": query subject [1,] 1 2080 [2,] 1 2081 Slot "DIM": [1] 1 3528 > tx.3[2080:2081,] GRanges with 2 ranges and 2 elementMetadata values seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr3 [41263094, 41294629] - | 11552 uc003cku.2 [2] chr3 [41263094, 41978664] - | 11553 uc003ckv.2 seqlengths chr1 chr1_random chr10 ... chrX_random chrY 247249719 1663265 135374737 ... 1719168 57772954 So it seems your location is in a region that is said to be transcribed. I could not find an Entrez Gene ID associated with the "known gene" tx_name values just above. > sessionInfo() R version 2.12.0 Under development (unstable) (2010-06-30 r52417) Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] org.Hs.eg.db_2.4.1 RSQLite_0.9-1 DBI_0.2-5 [4] AnnotationDbi_1.11.1 Biobase_2.9.0 GenomicFeatures_1.1.11 [7] GenomicRanges_1.1.15 IRanges_1.7.32 weaver_1.15.0 [10] codetools_0.2-2 digest_0.4.2 loaded via a namespace (and not attached): [1] BSgenome_1.17.5 Biostrings_2.17.26 RCurl_1.4-2 XML_3.1-0 [5] biomaRt_2.5.1 rtracklayer_1.9.3 On Tue, Aug 31, 2010 at 11:43 PM, Andrew Yee <yee at="" post.harvard.edu=""> wrote: > I'm interested in converting genomic coordinates to gene names, with > potential use of the org.Hs.eg.db library, e.g. converting chr3:41,266,083 > to CTNNB1. > > I know that this topic has been addressed before, see e.g.: > > https://stat.ethz.ch/pipermail/bioconductor/2009-January/025906.html (discusses > use of overlap in IRanges) > https://stat.ethz.ch/pipermail/bioconductor/2009-October/030140.html > > I was wondering if there have been any new solutions or new packages that > address this problem since these threads. > > Thanks, > Andrew > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Thank you very much for your suggestions! Thanks, Andrew On Wed, Sep 1, 2010 at 3:53 AM, Vincent Carey <stvjc@channing.harvard.edu>wrote: > There are many possible approaches and possible pitfalls. Surely the > following is relevant: > > > get("CTNNB1", revmap(org.Hs.egSYMBOL)) > [1] "1499" > > > get("1499", org.Hs.egCHRLOC) > 3 > 41240941 > > get("1499", org.Hs.egCHRLOCEND) > 3 > 41281939 > > Your location lies within these limits. You could do this more > systematically by defining a collection > of Entrez Gene IDs and building an IRanges or GRanges instance that > stores all the "gene boundary" > information for these IDs. You will have to attend to signs and > multiplicities, and to build versions. > > The GenomicFeatures makeTranscriptDb* facilities are potentially > useful when one is interested in > transcribed or exonic regions specifically. In the following, tx.3 is > an extract from the result of > makeTranscriptDbFromUCSC("hg18"): > > > get("1499", org.Hs.egUCSCKG) > [1] "uc003ckp.2" "uc003ckq.2" "uc003ckr.2" "uc003cks.2" "uc003ckt.1" > [6] "uc010hia.1" "uc011azf.1" "uc011azg.1" > > tx.3[ elementMetadata(tx.3)$tx_name %in% .Last.value, ] > GRanges with 6 ranges and 2 elementMetadata values > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr3 [41211405, 41255849] + | 11545 uc010hia.1 > [2] chr3 [41215946, 41256943] + | 11546 uc003ckp.2 > [3] chr3 [41215946, 41256943] + | 11547 uc003ckq.2 > [4] chr3 [41215946, 41256943] + | 11548 uc003ckr.2 > [5] chr3 [41249904, 41253941] + | 11550 uc003cks.2 > [6] chr3 [41252167, 41253962] + | 11551 uc003ckt.1 > > seqlengths > chr1 chr1_random chr10 ... chrX_random chrY > 247249719 1663265 135374737 ... 1719168 57772954 > > and there are undoubtedly ways to use biomaRt to address your concern. > > Perhaps the following is also of interest: > > > findOverlaps(IRanges(start=41266083,width=1), ranges(tx.3)) > An object of class "RangesMatching" > Slot "matchMatrix": > query subject > [1,] 1 2080 > [2,] 1 2081 > > Slot "DIM": > [1] 1 3528 > > > tx.3[2080:2081,] > GRanges with 2 ranges and 2 elementMetadata values > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr3 [41263094, 41294629] - | 11552 uc003cku.2 > [2] chr3 [41263094, 41978664] - | 11553 uc003ckv.2 > > seqlengths > chr1 chr1_random chr10 ... chrX_random chrY > 247249719 1663265 135374737 ... 1719168 57772954 > > So it seems your location is in a region that is said to be > transcribed. I could > not find an Entrez Gene ID associated with the "known gene" tx_name values > just above. > > > sessionInfo() > R version 2.12.0 Under development (unstable) (2010-06-30 r52417) > Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices datasets tools utils methods > [8] base > > other attached packages: > [1] org.Hs.eg.db_2.4.1 RSQLite_0.9-1 DBI_0.2-5 > [4] AnnotationDbi_1.11.1 Biobase_2.9.0 GenomicFeatures_1.1.11 > [7] GenomicRanges_1.1.15 IRanges_1.7.32 weaver_1.15.0 > [10] codetools_0.2-2 digest_0.4.2 > > loaded via a namespace (and not attached): > [1] BSgenome_1.17.5 Biostrings_2.17.26 RCurl_1.4-2 XML_3.1-0 > [5] biomaRt_2.5.1 rtracklayer_1.9.3 > > > On Tue, Aug 31, 2010 at 11:43 PM, Andrew Yee <yee@post.harvard.edu> wrote: > > I'm interested in converting genomic coordinates to gene names, with > > potential use of the org.Hs.eg.db library, e.g. converting > chr3:41,266,083 > > to CTNNB1. > > > > I know that this topic has been addressed before, see e.g.: > > > > https://stat.ethz.ch/pipermail/bioconductor/2009-January/025906.ht ml(discusses > > use of overlap in IRanges) > > https://stat.ethz.ch/pipermail/bioconductor/2009-October/030140.html > > > > I was wondering if there have been any new solutions or new packages that > > address this problem since these threads. > > > > Thanks, > > Andrew > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 638 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6