Wig files?
1
0
Entering edit mode
Ed Siefker ▴ 230
@ed-siefker-5136
Last seen 14 months ago
United States
What do I use to handle .wig files in a fast and convenient manner? I have a bunch of .wig files from the TCGA. All I want to know is if a single position was sufficiently covered by the sequencer to call a mutation. This is represented by a '1' at the appropriate line in the file. Insufficient coverage is represented by a '0' or the lack of any lines corresponding to those coordinates. 'rtracklayer' seems to be the package people use to handle .wig files. So I started R, installed 'rtracklayer' and executed: wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) as indicated in the documentation. So far, it's been about 3 hours of maxed out CPU. What is 'rtracklayer' doing? Sure, it's a 91mb .wig, but I can grep the thing in less than 1s. Is this expected behavior? Should I be going about this in a different way?
Coverage Coverage • 2.1k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States
Something is probably going wrong. Would you mind pointing me to this WIG file, if it's something public? Also, you'll find things to be a lot more efficient if you convert to BigWig with wigToBigWig() and then import that. Thanks, Michael On Thu, Jul 12, 2012 at 1:14 PM, Ed Siefker <ebs15242@gmail.com> wrote: > What do I use to handle .wig files in a fast and convenient manner? > > I have a bunch of .wig files from the TCGA. All I want to know is > if a single position was sufficiently covered by the sequencer to > call a mutation. This is represented by a '1' at the appropriate > line in the file. Insufficient coverage is represented by a '0' > or the lack of any lines corresponding to those coordinates. > > 'rtracklayer' seems to be the package people use to handle > .wig files. So I started R, installed 'rtracklayer' and executed: > wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) > as indicated in the documentation. > > So far, it's been about 3 hours of maxed out CPU. What is > 'rtracklayer' doing? Sure, it's a 91mb .wig, but I can grep > the thing in less than 1s. Is this expected behavior? Should > I be going about this in a different way? > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Gladly. The Wig is available at: https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anony mous/tumor/coad/gsc/hgsc.bcm.edu/illuminaga_dnaseq/mutations/hgsc.bcm edu_COAD.IlluminaGA_DNASeq.Level_2.1.1.0/TCGA-A6-3807.wig.bz2 My R session looks like this: >source("http://bioconductor.org/biocLite.R") >biocLite("rtracklayer") >library(rtracklayer) >wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) That ran for about 4 hours but apparently completed. > class(wigtest) [1] "SimpleGenomicRangesList" attr(,"package") [1] "GenomicRanges" > length(wigtest) [1] 199482 > head(wigtest[[1]]) GRanges with 6 ranges and 1 elementMetadata col: seqnames ranges strand | score <rle> <iranges> <rle> | <numeric> [1] chr1 [20227, 20227] * | 1 [2] chr1 [20228, 20228] * | 1 [3] chr1 [20229, 20229] * | 1 [4] chr1 [20230, 20230] * | 1 [5] chr1 [20231, 20231] * | 1 [6] chr1 [20232, 20232] * | 1 --- seqlengths: chr1 NA However the R process is now consuming 9.8 gigs of memory. I am using a manually compiled version of R on RHEL6. Here is my sessionInfo(): > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.16.2 GenomicRanges_1.8.7 IRanges_1.14.4 [4] BiocGenerics_0.2.0 BiocInstaller_1.4.7 loaded via a namespace (and not attached): [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 RCurl_1.91-1 [5] Rsamtools_1.8.5 stats4_2.15.0 tools_2.15.0 XML_3.9-4 [9] zlibbioc_1.2.0 I think that's all the information I have that would be useful to you. Let me know if I forgot anything. Thanks. -Ed ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Michael Lawrence [lawrence.michael@gene.com] Sent: Thursday, July 12, 2012 7:00 PM To: Ed Siefker Cc: bioconductor at r-project.org Subject: Re: [BioC] Wig files? Something is probably going wrong. Would you mind pointing me to this WIG file, if it's something public? Also, you'll find things to be a lot more efficient if you convert to BigWig with wigToBigWig() and then import that. Thanks, Michael On Thu, Jul 12, 2012 at 1:14 PM, Ed Siefker <ebs15242 at="" gmail.com=""> wrote: > What do I use to handle .wig files in a fast and convenient manner? > > I have a bunch of .wig files from the TCGA. All I want to know is > if a single position was sufficiently covered by the sequencer to > call a mutation. This is represented by a '1' at the appropriate > line in the file. Insufficient coverage is represented by a '0' > or the lack of any lines corresponding to those coordinates. > > 'rtracklayer' seems to be the package people use to handle > .wig files. So I started R, installed 'rtracklayer' and executed: > wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) > as indicated in the documentation. > > So far, it's been about 3 hours of maxed out CPU. What is > 'rtracklayer' doing? Sure, it's a 91mb .wig, but I can grep > the thing in less than 1s. Is this expected behavior? Should > I be going about this in a different way? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6