Question: Wig files?
0
gravatar for Ed Siefker
7.3 years ago by
Ed Siefker220
United States
Ed Siefker220 wrote:
What do I use to handle .wig files in a fast and convenient manner? I have a bunch of .wig files from the TCGA. All I want to know is if a single position was sufficiently covered by the sequencer to call a mutation. This is represented by a '1' at the appropriate line in the file. Insufficient coverage is represented by a '0' or the lack of any lines corresponding to those coordinates. 'rtracklayer' seems to be the package people use to handle .wig files. So I started R, installed 'rtracklayer' and executed: wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) as indicated in the documentation. So far, it's been about 3 hours of maxed out CPU. What is 'rtracklayer' doing? Sure, it's a 91mb .wig, but I can grep the thing in less than 1s. Is this expected behavior? Should I be going about this in a different way?
coverage • 875 views
ADD COMMENTlink modified 7.3 years ago by Michael Lawrence11k • written 7.3 years ago by Ed Siefker220
Answer: Wig files?
0
gravatar for Michael Lawrence
7.3 years ago by
United States
Michael Lawrence11k wrote:
Something is probably going wrong. Would you mind pointing me to this WIG file, if it's something public? Also, you'll find things to be a lot more efficient if you convert to BigWig with wigToBigWig() and then import that. Thanks, Michael On Thu, Jul 12, 2012 at 1:14 PM, Ed Siefker <ebs15242@gmail.com> wrote: > What do I use to handle .wig files in a fast and convenient manner? > > I have a bunch of .wig files from the TCGA. All I want to know is > if a single position was sufficiently covered by the sequencer to > call a mutation. This is represented by a '1' at the appropriate > line in the file. Insufficient coverage is represented by a '0' > or the lack of any lines corresponding to those coordinates. > > 'rtracklayer' seems to be the package people use to handle > .wig files. So I started R, installed 'rtracklayer' and executed: > wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) > as indicated in the documentation. > > So far, it's been about 3 hours of maxed out CPU. What is > 'rtracklayer' doing? Sure, it's a 91mb .wig, but I can grep > the thing in less than 1s. Is this expected behavior? Should > I be going about this in a different way? > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 7.3 years ago by Michael Lawrence11k
Gladly. The Wig is available at: https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anony mous/tumor/coad/gsc/hgsc.bcm.edu/illuminaga_dnaseq/mutations/hgsc.bcm edu_COAD.IlluminaGA_DNASeq.Level_2.1.1.0/TCGA-A6-3807.wig.bz2 My R session looks like this: >source("http://bioconductor.org/biocLite.R") >biocLite("rtracklayer") >library(rtracklayer) >wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) That ran for about 4 hours but apparently completed. > class(wigtest) [1] "SimpleGenomicRangesList" attr(,"package") [1] "GenomicRanges" > length(wigtest) [1] 199482 > head(wigtest[[1]]) GRanges with 6 ranges and 1 elementMetadata col: seqnames ranges strand | score <rle> <iranges> <rle> | <numeric> [1] chr1 [20227, 20227] * | 1 [2] chr1 [20228, 20228] * | 1 [3] chr1 [20229, 20229] * | 1 [4] chr1 [20230, 20230] * | 1 [5] chr1 [20231, 20231] * | 1 [6] chr1 [20232, 20232] * | 1 --- seqlengths: chr1 NA However the R process is now consuming 9.8 gigs of memory. I am using a manually compiled version of R on RHEL6. Here is my sessionInfo(): > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.16.2 GenomicRanges_1.8.7 IRanges_1.14.4 [4] BiocGenerics_0.2.0 BiocInstaller_1.4.7 loaded via a namespace (and not attached): [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 RCurl_1.91-1 [5] Rsamtools_1.8.5 stats4_2.15.0 tools_2.15.0 XML_3.9-4 [9] zlibbioc_1.2.0 I think that's all the information I have that would be useful to you. Let me know if I forgot anything. Thanks. -Ed ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Michael Lawrence [lawrence.michael@gene.com] Sent: Thursday, July 12, 2012 7:00 PM To: Ed Siefker Cc: bioconductor at r-project.org Subject: Re: [BioC] Wig files? Something is probably going wrong. Would you mind pointing me to this WIG file, if it's something public? Also, you'll find things to be a lot more efficient if you convert to BigWig with wigToBigWig() and then import that. Thanks, Michael On Thu, Jul 12, 2012 at 1:14 PM, Ed Siefker <ebs15242 at="" gmail.com=""> wrote: > What do I use to handle .wig files in a fast and convenient manner? > > I have a bunch of .wig files from the TCGA. All I want to know is > if a single position was sufficiently covered by the sequencer to > call a mutation. This is represented by a '1' at the appropriate > line in the file. Insufficient coverage is represented by a '0' > or the lack of any lines corresponding to those coordinates. > > 'rtracklayer' seems to be the package people use to handle > .wig files. So I started R, installed 'rtracklayer' and executed: > wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) > as indicated in the documentation. > > So far, it's been about 3 hours of maxed out CPU. What is > 'rtracklayer' doing? Sure, it's a 91mb .wig, but I can grep > the thing in less than 1s. Is this expected behavior? Should > I be going about this in a different way? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 7.3 years ago by Siefker, Ed B.30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 161 users visited in the last hour