Retrieving gene name where given genomic region is included.
3
0
Entering edit mode
Boel Brynedal ▴ 200
@boel-brynedal-2091
Last seen 9.7 years ago
Dear List, I have long lists of genomic regions (chr;start;end) where a given event has taken place. These regions can be an exon, an intronic region, or similar. Most (all) of these events have taken place within the boundaries of genes, and I would like to retrieve the gene names (ensemble ID). I've tried to use biomaRt: > getBM(attributes=c("ensembl_gene_id"),filter=c("chromosome_name","star t","end"), values=list(10,17317394,17317851), mart=ensembl) [1] ensembl_gene_id <0 rows> (or 0-length row.names) But since no whole GENE is within these boundaries, I get nothing. i've also tried asking for "ensembl_exon_id" when looking at exon events (not all of them are of that kind however), and this generally results in a long list of exon IDs (because one exon can be part of several transcripts). I would appreciate any ideas of how this could be done in a better way. Thank you! Boel
• 1.6k views
ADD COMMENT
0
Entering edit mode
@andreia-fonseca-3796
Last seen 7.3 years ago
Hi Boel, I didn't try your code, but probably you are not getting anything because there is no gene that starts and ends exactly in the positions you have specified in your filters. I had to do something similar and what I did was to download all the ensembl_gene_ids for the chromosomes together with the start and end information and then I made a query in mysql, to find the ones overlaping with my regions of interest, this is fast, but needs two steps. With kind regards, Andreia On Mon, Jan 18, 2010 at 5:28 PM, Boel Brynedal <boel.brynedal@ki.se> wrote: > Dear List, > > I have long lists of genomic regions (chr;start;end) where a given event > has taken place. These regions can be an exon, an intronic region, or > similar. Most (all) of these events have taken place within the > boundaries of genes, and I would like to retrieve the gene names > (ensemble ID). > > I've tried to use biomaRt: > > > > getBM(attributes=c("ensembl_gene_id"),filter=c("chromosome_name","st art","end"), > values=list(10,17317394,17317851), mart=ensembl) > [1] ensembl_gene_id > <0 rows> (or 0-length row.names) > > But since no whole GENE is within these boundaries, I get nothing. i've > also tried asking for "ensembl_exon_id" when looking at exon events (not > all of them are of that kind however), and this generally results in a > long list of exon IDs (because one exon can be part of several > transcripts). > > I would appreciate any ideas of how this could be done in a better way. > > Thank you! > > Boel > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- -------------------------------------------- Andreia J. Amaral Unidade de Imunologia ClĂ­nica Instituto de Medicina Molecular Universidade de Lisboa email: andreiaamaral@fm.ul.pt andreia.fonseca@gmail.com [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 6 days ago
United States
This looks like a job for GenomicFeatures and IRanges. There are various approaches, but this seems to help with your specific example. Get your software into a shape like the one indicated in the sessionInfo below, and then do library(GenomicFeatures) library(GenomicFeatures.Hsapiens.UCSC.hg18) genes = geneHuman() g10 = genes[genes$chrom=="chr10",] dim(g10) g10i = RangedData(IRanges(start=g10$txStart, end=g10$txEnd), space="chr10", name=g10$name) g10i boel = IRanges(start=17317394, end=17317851) # could be richer findOverlaps(boel, ranges(g10i)[["chr10"]]) g10i[298:307,] kgn = unique(g10i[298:307,]$name) mget(kgn, revmap(org.Hs.egUCSCKG)) to find > get("7431", org.Hs.egGENENAME) [1] "vimentin" > sessionInfo() R version 2.10.1 RC (2009-12-10 r50697) i386-apple-darwin9.8.0 locale: [1] C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] org.Hs.eg.db_2.3.6 [2] RSQLite_0.7-3 [3] DBI_0.2-5 [4] AnnotationDbi_1.7.20 [5] Biobase_2.5.8 [6] IRanges_1.5.16 [7] GenomicFeatures.Hsapiens.UCSC.hg18_0.1.0 [8] GenomicFeatures_0.1.4 [9] rtracklayer_1.7.2 [10] RCurl_1.2-1 [11] bitops_1.0-4.1 [12] weaver_1.11.1 [13] codetools_0.2-2 [14] digest_0.4.1 loaded via a namespace (and not attached): [1] BSgenome_1.15.3 Biostrings_2.15.5 MASS_7.3-4 XML_2.6-0 [5] annotate_1.23.4 globaltest_5.1.1 multtest_2.3.0 splines_2.10.1 [9] survival_2.35-7 xtable_1.5-5 On Mon, Jan 18, 2010 at 12:28 PM, Boel Brynedal <boel.brynedal at="" ki.se=""> wrote: > Dear List, > > I have long lists of genomic regions (chr;start;end) where a given event > has taken place. These regions can be an exon, an intronic region, or > similar. ?Most (all) of these events have taken place within the > boundaries of genes, and I would like to retrieve the gene names > (ensemble ID). > > I've tried to use biomaRt: >> > getBM(attributes=c("ensembl_gene_id"),filter=c("chromosome_name","st art","end"), > values=list(10,17317394,17317851), mart=ensembl) > [1] ensembl_gene_id > <0 rows> (or 0-length row.names) > > But since no whole GENE is within these boundaries, I get nothing. i've > also tried asking for "ensembl_exon_id" when looking at exon events (not > all of them are of that kind however), and this generally results in a > long list of exon IDs (because one exon can be part of several transcripts). > > I would appreciate any ideas of how this could be done in a better way. > > Thank you! > > Boel > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States
Hi, On Mon, Jan 18, 2010 at 12:28 PM, Boel Brynedal <boel.brynedal at="" ki.se=""> wrote: > Dear List, > > I have long lists of genomic regions (chr;start;end) where a given event > has taken place. These regions can be an exon, an intronic region, or > similar. ?Most (all) of these events have taken place within the > boundaries of genes, and I would like to retrieve the gene names > (ensemble ID). > > I've tried to use biomaRt: >> > getBM(attributes=c("ensembl_gene_id"),filter=c("chromosome_name","st art","end"), > values=list(10,17317394,17317851), mart=ensembl) > [1] ensembl_gene_id > <0 rows> (or 0-length row.names) > > But since no whole GENE is within these boundaries, I get nothing. i've > also tried asking for "ensembl_exon_id" when looking at exon events (not > all of them are of that kind however), and this generally results in a > long list of exon IDs (because one exon can be part of several transcripts). > > I would appreciate any ideas of how this could be done in a better way. In addition to the GenomicFeatures package, I've also been developing a package that can handle situations like this called "GenomeAnnotations" for work I've been doing w/ *-seq data. It's not available through the normal bioconductor/biocLite channels, however, so you'd have to be comfortable installing packages from their source in order to use it (which isn't too difficult (assuming your on linux/os x -- I don't really have any experience with windows, sorry)). A sample session that shows you how you could use my package to answer this question would look like so: R> library(GenomeAnnotations) R> hg18r <- GenomeDB('hg18', 'refseq') R> genes <- getGenesOnChromosome(hg18r, 10, 17317394, 17317851, strictly.contained=FALSE) R> names(genes) [1] "VIM" Which I guess is the gene you're looking for? Note that if "strictly.contained" was TRUE, then you would have been given an empty list. I have instructions on how you to download and install the base GenomeAnnotations package here: http://wiki.github.com/lianos/GenomeAnnotations/ And an appropriate annotation package for your genome/annotation-source of interest here (I have ones prebuilt for hg18 using aceview and refseq annotations, as well as hg19 w/ refseq annos): http://wiki.github.com/lianos/GenomeAnnotations/installing-annotation- packages There are other examples on how to use it here: http://wiki.github.com/lianos/GenomeAnnotations/example-usage There is some skeletal documentation for the package using the normal ?function ways after you've installed it, but I'm working on making it better since the package is in active development. If you end up using it, feel free to ask questions and/or suggest on ways you'd like to so it improved. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT

Login before adding your answer.

Traffic: 304 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6