Wig files?

0

Entering edit mode

Ed Siefker ▴ 230

@ed-siefker-5136

Last seen 5 months ago

United States

What do I use to handle .wig files in a fast and convenient manner? I have a bunch of .wig files from the TCGA. All I want to know is if a single position was sufficiently covered by the sequencer to call a mutation. This is represented by a '1' at the appropriate line in the file. Insufficient coverage is represented by a '0' or the lack of any lines corresponding to those coordinates. 'rtracklayer' seems to be the package people use to handle .wig files. So I started R, installed 'rtracklayer' and executed: wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) as indicated in the documentation. So far, it's been about 3 hours of maxed out CPU. What is 'rtracklayer' doing? Sure, it's a 91mb .wig, but I can grep the thing in less than 1s. Is this expected behavior? Should I be going about this in a different way?

Coverage Coverage • 1.7k views

ADD COMMENT • link updated 11.8 years ago by Michael Lawrence ★ 11k • written 11.8 years ago by Ed Siefker ▴ 230

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

Something is probably going wrong. Would you mind pointing me to this WIG file, if it's something public? Also, you'll find things to be a lot more efficient if you convert to BigWig with wigToBigWig() and then import that. Thanks, Michael On Thu, Jul 12, 2012 at 1:14 PM, Ed Siefker <ebs15242@gmail.com> wrote: > What do I use to handle .wig files in a fast and convenient manner? > > I have a bunch of .wig files from the TCGA. All I want to know is > if a single position was sufficiently covered by the sequencer to > call a mutation. This is represented by a '1' at the appropriate > line in the file. Insufficient coverage is represented by a '0' > or the lack of any lines corresponding to those coordinates. > > 'rtracklayer' seems to be the package people use to handle > .wig files. So I started R, installed 'rtracklayer' and executed: > wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) > as indicated in the documentation. > > So far, it's been about 3 hours of maxed out CPU. What is > 'rtracklayer' doing? Sure, it's a 91mb .wig, but I can grep > the thing in less than 1s. Is this expected behavior? Should > I be going about this in a different way? > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Gladly. The Wig is available at: https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anony mous/tumor/coad/gsc/hgsc.bcm.edu/illuminaga_dnaseq/mutations/hgsc.bcm edu_COAD.IlluminaGA_DNASeq.Level_2.1.1.0/TCGA-A6-3807.wig.bz2 My R session looks like this: >source("http://bioconductor.org/biocLite.R") >biocLite("rtracklayer") >library(rtracklayer) >wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) That ran for about 4 hours but apparently completed. > class(wigtest) [1] "SimpleGenomicRangesList" attr(,"package") [1] "GenomicRanges" > length(wigtest) [1] 199482 > head(wigtest[[1]]) GRanges with 6 ranges and 1 elementMetadata col: seqnames ranges strand | score <rle> <iranges> <rle> | <numeric> [1] chr1 [20227, 20227] * | 1 [2] chr1 [20228, 20228] * | 1 [3] chr1 [20229, 20229] * | 1 [4] chr1 [20230, 20230] * | 1 [5] chr1 [20231, 20231] * | 1 [6] chr1 [20232, 20232] * | 1 --- seqlengths: chr1 NA However the R process is now consuming 9.8 gigs of memory. I am using a manually compiled version of R on RHEL6. Here is my sessionInfo(): > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.16.2 GenomicRanges_1.8.7 IRanges_1.14.4 [4] BiocGenerics_0.2.0 BiocInstaller_1.4.7 loaded via a namespace (and not attached): [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 RCurl_1.91-1 [5] Rsamtools_1.8.5 stats4_2.15.0 tools_2.15.0 XML_3.9-4 [9] zlibbioc_1.2.0 I think that's all the information I have that would be useful to you. Let me know if I forgot anything. Thanks. -Ed ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Michael Lawrence [lawrence.michael@gene.com] Sent: Thursday, July 12, 2012 7:00 PM To: Ed Siefker Cc: bioconductor at r-project.org Subject: Re: [BioC] Wig files? Something is probably going wrong. Would you mind pointing me to this WIG file, if it's something public? Also, you'll find things to be a lot more efficient if you convert to BigWig with wigToBigWig() and then import that. Thanks, Michael On Thu, Jul 12, 2012 at 1:14 PM, Ed Siefker <ebs15242 at="" gmail.com=""> wrote: > What do I use to handle .wig files in a fast and convenient manner? > > I have a bunch of .wig files from the TCGA. All I want to know is > if a single position was sufficiently covered by the sequencer to > call a mutation. This is represented by a '1' at the appropriate > line in the file. Insufficient coverage is represented by a '0' > or the lack of any lines corresponding to those coordinates. > > 'rtracklayer' seems to be the package people use to handle > .wig files. So I started R, installed 'rtracklayer' and executed: > wigtest <- import("TCGA-A6-3807.wig", asRangedData = FALSE) > as indicated in the documentation. > > So far, it's been about 3 hours of maxed out CPU. What is > 'rtracklayer' doing? Sure, it's a 91mb .wig, but I can grep > the thing in less than 1s. Is this expected behavior? Should > I be going about this in a different way? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.8 years ago Siefker, Ed B. ▴ 30

Login before adding your answer.