Gviz
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.2 years ago
Dear Bioconductor-community, I am using package Gviz to plot genomic regions. library(Gviz) biomTrack <- BiomartGeneRegionTrack(genome = "hg19", chromosome = chr, start = 63038367 , end = 64536346, name = "ENSEMBLE") I would like to plot only those regions that are protein coding, so I tried this: subTrack <-biomTrack[feature(biomTrack)=="protein_coding"] When plotting the genomic region, I figured that eg. the gene with gene symbol PPM1H has 5 different transcripts with feature protein coding. When I searched for that gene in ensemble: http://www.ensembl. org/Homo_sapiens/Gene/Summary?g=ENSG00000111110;r=12:63037762-63328817 only the first transcript has as biotype "Protein coding". The remaining transcripts are annotated as "Processed transcripts". My question: Is it possile to extract only those transcripts that are annotated as Protein coding in Ensemble? Is there a way to extract this information from the biomTrack object? Thank you for any hint to solve my problem. Best regards, Fiorella -- output of sessionInfo(): > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel grid stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.16.0 BSgenome.Hsapiens.UCSC.hg19_1.3.19 BSgenome_1.28.0 [4] Biostrings_2.28.0 GenomicRanges_1.12.2 IRanges_1.18.1 [7] BiocGenerics_0.6.0 Gviz_1.4.1 loaded via a namespace (and not attached): [1] AnnotationDbi_1.22.5 Biobase_2.20.0 biovizBase_1.8.0 bitops_1.0-5 cluster_1.14.4 [6] colorspace_1.2-2 DBI_0.2-7 dichromat_2.0-0 GenomicFeatures_1.12.1 Hmisc_3.10-1.1 [11] labeling_0.1 lattice_0.20-15 munsell_0.4 plyr_1.8 RColorBrewer_1.0-5 [16] RCurl_1.95-4.1 Rsamtools_1.12.3 RSQLite_0.11.3 rtracklayer_1.20.2 scales_0.2.3 [21] stats4_3.0.0 stringr_0.6.2 tools_3.0.0 XML_3.96-1.1 zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.
Gviz Gviz • 2.0k views
ADD COMMENT
0
Entering edit mode
@florianhahnenovartiscom-3784
Last seen 6.2 years ago
Switzerland
Hi Fiorella, you can do this filtering directly in the constructor function. Take a look at the 'filters' argument in the help page for the BiomartGeneRegionTrack class. It's value is directly forwarded to biomaRt's getBM function, so it's documentation as well as 'listFilters' might also be instructive. Your particular case would probably boil down to something like this: biomTrack <- BiomartGeneRegionTrack(genome = "hg19", chromosome = chr, start = 63038367 , end = 64536346, name = "ENSEMBLE", filters=list(biotype="protein_coding") I find it quite useful when creating more complex biomaRt queries to first construct something similar in the web interface and then look at the XML representation of the query (there is a button for it) Hope that helps, Florian Florian Hahne Novartis Institute For Biomedical Research Translational Sciences / Preclinical Safety / PCS Informatics Expert Data Integration and Modeling Bioinformatics CHBS, WKL-135.2.26 Novartis Institute For Biomedical Research, Werk Klybeck Klybeckstrasse 141 CH-4057 Basel Switzerland Phone: +41 61 6967127 Email : florian.hahne at novartis.com On 5/14/13 3:45 PM, "Fiorella Schischlik [guest]" <guest at="" bioconductor.org=""> wrote: >63038367
ADD COMMENT
0
Entering edit mode
Hi, On Tue, May 14, 2013 at 8:47 AM, Hahne, Florian <florian.hahne at="" novartis.com=""> wrote: > Hi Fiorella, > you can do this filtering directly in the constructor function. Take a > look at the 'filters' argument in the help page for the > BiomartGeneRegionTrack class. It's value is directly forwarded to > biomaRt's getBM function, so it's documentation as well as 'listFilters' > might also be instructive. Your particular case would probably boil down > to something like this: > > biomTrack <- BiomartGeneRegionTrack(genome = "hg19", chromosome = chr, > start = 63038367 , end = 64536346, name = "ENSEMBLE", > filters=list(biotype="protein_coding") > > I find it quite useful when creating more complex biomaRt queries to first > construct something similar in the web interface and then look at the XML > representation of the query (there is a button for it) It seems that the problem is that the `biotype` is set at the gene level, and not the transcript level. You'll see that doing this still returns all transcripts for PPM1H. In theory, filtering by `transcript_biotype == 'protein_coding` should do the trick, but you get some error like so: Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple attribute pages are not allowed I can run around this by doing a second query then filtering on the transript ids that are only protein_coding. Maybe something a bit more elgant than: R> check <- getBM(c('transcript_biotype', 'ensembl_transcript_id'), filters=c('chromosome_name', 'start', 'end'), values=list(chromosome_name='12', start=63038367, end=64536346), mart=m) R> biomPC <- biomTrack R> ranges(biomPC) <- ranges(biomTrack)[mcols(ranges(biomTrack))$transcript %in% subset(check, transcript_biotype == "protein_coding")$ensembl_transcript_id] Ouch .. like I said, there's probably a more elegant way to get that, but there it is. HTH, -steve -- Steve Lianoglou Computational Biologist Department of Bioinformatics and Computational Biology Genentech
ADD REPLY
0
Entering edit mode
Hi Floriella, Steve, How about using the with_protein_id filter instead: mart=useMart("ensembl","hsapiens_gene_ensembl") biomTrack <- BiomartGeneRegionTrack(genome = "hg19", chromosome = "chr12",start = 63038367 , end = 64536346, name = "ENSEMBLE", filters=list("with_protein_id"=TRUE),mart=mart,showId=TRUE) This will only show transcripts that have a protein id and thus produce proteins. Cheers, Steffen On Tue, May 14, 2013 at 9:00 AM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > Hi, > > On Tue, May 14, 2013 at 8:47 AM, Hahne, Florian > <florian.hahne@novartis.com> wrote: > > Hi Fiorella, > > you can do this filtering directly in the constructor function. Take a > > look at the 'filters' argument in the help page for the > > BiomartGeneRegionTrack class. It's value is directly forwarded to > > biomaRt's getBM function, so it's documentation as well as 'listFilters' > > might also be instructive. Your particular case would probably boil down > > to something like this: > > > > biomTrack <- BiomartGeneRegionTrack(genome = "hg19", chromosome = chr, > > start = 63038367 , end = 64536346, name = "ENSEMBLE", > > filters=list(biotype="protein_coding") > > > > I find it quite useful when creating more complex biomaRt queries to > first > > construct something similar in the web interface and then look at the XML > > representation of the query (there is a button for it) > > It seems that the problem is that the `biotype` is set at the gene > level, and not the transcript level. You'll see that doing this still > returns all transcripts for PPM1H. > > In theory, filtering by `transcript_biotype == 'protein_coding` should > do the trick, but you get some error like so: > > Query ERROR: caught BioMart::Exception::Usage: Attributes from > multiple attribute pages are not allowed > > I can run around this by doing a second query then filtering on the > transript ids that are only protein_coding. Maybe something a bit more > elgant than: > > R> check <- getBM(c('transcript_biotype', 'ensembl_transcript_id'), > filters=c('chromosome_name', 'start', 'end'), > values=list(chromosome_name='12', start=63038367, end=64536346), > mart=m) > > R> biomPC <- biomTrack > R> ranges(biomPC) <- > ranges(biomTrack)[mcols(ranges(biomTrack))$transcript %in% > subset(check, transcript_biotype == > "protein_coding")$ensembl_transcript_id] > > Ouch .. like I said, there's probably a more elegant way to get that, > but there it is. > > HTH, > -steve > > -- > Steve Lianoglou > Computational Biologist > Department of Bioinformatics and Computational Biology > Genentech > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6