rtrackleyer/GenomicRanges: How can I group GRanges by metadata attributes
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
After reading in a GTF file with rtrackler::import(), I have a GRanges object. How can I select the ranges whose gene_id (or transcript_id) match a particular value? > library(rtracklayer) > gtf <- import(system.file("tests", "gtf.gff", package="rtracklayer"), asRangedData=F) I can see the metadata with 'mcols(gtf)', and I can even see the gene_id with 'mcols(gtf)$group', > mcols(gtf)$group [1] gene_id "ENSMUSG00000033501.1"; transcript_id "ENSMUST00000040592.1"; exon_id "ENSMUSE00000310143.1"; [omitted] 3 Levels: gene_id "ENSMUSG00000033501.1"; transcript_id "ENSMUST00000040592.1"; exon_id "ENSMUSE00000310143.1"; ... but I don't see any way to utilize the name=value strings in the group column. Do I just have to parse the values group column myself? Thanks in advance for any help. -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.20.4 GenomicRanges_1.12.5 IRanges_1.18.4 BiocGenerics_0.6.0 BiocInstaller_1.10.4 loaded via a namespace (and not attached): [1] Biostrings_2.28.0 bitops_1.0-6 BSgenome_1.28.0 RCurl_1.95-4.1 Rsamtools_1.12.4 stats4_3.0.2 [7] tools_3.0.2 XML_3.95-0.2 zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.
• 1.6k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
There are anywhere from 3 to infinity versions of GFF. When rtracklayer cannot detect the version (from either the filename extension or #gff-version directive), it assumes GFF version 1. To tell it otherwise, use the "version" argument. In this case, you want version="2". But what you have is actually a special subformat of GFF2, called GTF. rtracklayer could have detected this by the "gtf" extension, or you could pass format="gtf". It doesn't do anything special with GTF, though. Maybe what you really want is GenomicFeatures::makeTranscriptDbFromGFF, which uses rtracklayer to make a TranscriptDb from GFF data. That's generally more appropriate for representing transcript structures compared to GRanges. Michael On Thu, Oct 31, 2013 at 10:51 PM, chris warth [guest] < guest@bioconductor.org> wrote: > > After reading in a GTF file with rtrackler::import(), I have a GRanges > object. How can I select the ranges whose gene_id (or transcript_id) match > a particular value? > > > library(rtracklayer) > > gtf <- import(system.file("tests", "gtf.gff", package="rtracklayer"), > asRangedData=F) > > I can see the metadata with 'mcols(gtf)', and I can even see the gene_id > with 'mcols(gtf)$group', > > > mcols(gtf)$group > [1] gene_id "ENSMUSG00000033501.1"; transcript_id "ENSMUST00000040592.1"; > exon_id "ENSMUSE00000310143.1"; > [omitted] > 3 Levels: gene_id "ENSMUSG00000033501.1"; transcript_id > "ENSMUST00000040592.1"; exon_id "ENSMUSE00000310143.1"; ... > > but I don't see any way to utilize the name=value strings in the group > column. > > Do I just have to parse the values group column myself? > > Thanks in advance for any help. > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] rtracklayer_1.20.4 GenomicRanges_1.12.5 IRanges_1.18.4 > BiocGenerics_0.6.0 BiocInstaller_1.10.4 > > loaded via a namespace (and not attached): > [1] Biostrings_2.28.0 bitops_1.0-6 BSgenome_1.28.0 RCurl_1.95-4.1 > Rsamtools_1.12.4 stats4_3.0.2 > [7] tools_3.0.2 XML_3.95-0.2 zlibbioc_1.6.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6