Very high memory usage when using rtracklayer to import GFF3
1
0
Entering edit mode
@vince-s-buffalo-4618
Last seen 10.2 years ago
United States
Hi All, I have tried to use import.gff3 from the rtracklayer package to import annotation information for the mosquito genome (gff3 file here: http://aaegypti.vectorbase.org/GetData/Downloads/) which is only 20 MB but the memory usage has exceeded 100GB on one of our high memory servers. This seems like far too much to just read in the file (which takes only 3 seconds with read.delim) and convert to RangedData objects. Has anyone experienced similar problems? Here is my sessionInfo: R version 2.13.0 (2011-04-13) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.8.0 rtracklayer_1.12.4 RCurl_1.5-0 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] Biostrings_2.20.0 BSgenome_1.20.0 GenomicRanges_1.4.1 [4] IRanges_1.10.0 tools_2.13.0 XML_3.2-0 -- Vince Buffalo Statistical Programmer Bioinformatics Core UC Davis Genome Center University of California, Davis "There's real poetry in the real world. Science is the poetry of reality." -Richard Dawkins [[alternative HTML version deleted]]
convert rtracklayer convert rtracklayer • 829 views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
Are you sure this is valid GFF3? For example, consider this attribute field: ID=supercont1.1;molecule_type=dsDNA;GenBank:supercontig:AaegL1:superco nt1.1:1:5856339:1;translation_table=1;topology=linear;localization=chr omosomal; The string "GenBank:supercontig:AaegL1:supercont1.1:1:5856339:1" is delimited by semi-colons, but it does not conform to the key=value format. I will add a check for this to devel, so that the error is more obvious. You can still read this file if you use the "colnames" argument. If colnames=character(), you will get just the seqnames, start and end. If you just want the strand in addition to that, specify colnames="strand", etc. By default, all columns (including attributes) are parsed. Michael On Fri, Jul 1, 2011 at 11:48 AM, Vince S. Buffalo <vsbuffalo@gmail.com>wrote: > Hi All, > > I have tried to use import.gff3 from the rtracklayer package to import > annotation information for the mosquito genome (gff3 file here: > http://aaegypti.vectorbase.org/GetData/Downloads/) which is only 20 MB but > the memory usage has exceeded 100GB on one of our high memory servers. This > seems like far too much to just read in the file (which takes only 3 > seconds > with read.delim) and convert to RangedData objects. Has anyone experienced > similar problems? > > Here is my sessionInfo: > R version 2.13.0 (2011-04-13) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.8.0 rtracklayer_1.12.4 RCurl_1.5-0 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] Biostrings_2.20.0 BSgenome_1.20.0 GenomicRanges_1.4.1 > [4] IRanges_1.10.0 tools_2.13.0 XML_3.2-0 > > -- > Vince Buffalo > Statistical Programmer > Bioinformatics Core > UC Davis Genome Center > University of California, Davis > > "There's real poetry in the real world. Science is the poetry of reality." > -Richard Dawkins > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6