Question

import wig to coverage

2

Entering edit mode

liz.ingsimmons ▴ 140

@lizingsimmons-6935

Last seen 5.1 years ago

Germany

Hi,

I'm trying to import a wig.gz file and turn it into genomic coverage. I expected this to work but it didn't:

tmp <-import.wig("/GSM945575_WIGfs_Mnase_DYAD_artRM_sh73_bins50_ArtefactParam_-14_Scaled.wig.gz")
coverage(tmp)
## Error in (function (classes, fdef, mtable)  :
##  unable to find an inherited method for function ‘coverage’ for signature ‘"SimpleGenomicRangesList"’

Investigating this a bit further, I found that the class of 'tmp' is indeed "SimpleGenomicRangesList", and the class of each element of this list is 'UCSCData"

class(tmp)
## [1] "SimpleGenomicRangesList"
## attr(,"package")
## [1] "GenomicRanges"
class(tmp[[1]])
## [1] "UCSCData"
## attr(,"package")
## [1] "rtracklayer"

The documentation for import.wig says it should produce "a ‘GRanges’ (or ‘RangedData’ if ‘asRangedData’ is ‘TRUE’)", so I'm not sure why my result is a SimpleGenomicRangesList.

I'm not able to coerce the tmp object to a GRangesList with e.g. GRangesList, or the UCSCData objects with GRanges (or as(x, "GRanges")).

Can anyone suggest how I can convert this into a coverage object? The wig.gz file is available here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM945575

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] rtracklayer_1.26.2   GenomicRanges_1.18.4 GenomeInfoDb_1.2.4  
[4] IRanges_2.0.1        S4Vectors_0.4.0      BiocGenerics_0.12.1
[7] knitr_1.9           

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.5           BBmisc_1.8             
 [4] BiocParallel_1.0.1      Biostrings_2.34.1       bitops_1.0-6           
 [7] brew_1.0-6              checkmate_1.5.1         codetools_0.2-10       
[10] DBI_0.3.1               digest_0.6.8            evaluate_0.5.5         
[13] fail_1.2                foreach_1.4.2           formatR_1.0            
[16] GenomicAlignments_1.2.1 iterators_1.0.7         markdown_0.7.4         
[19] RCurl_1.95-4.5          Rsamtools_1.18.2        RSQLite_1.0.0          
[22] sendmailR_1.2-1         stringr_0.6.2           tools_3.1.1            
[25] XML_3.98-1.1            XVector_0.6.0           zlibbioc_1.12.0

rtracklayer granges import.wig • 3.2k views

ADD COMMENT • link updated 10.9 years ago by Hervé Pagès 16k • written 10.9 years ago by liz.ingsimmons ▴ 140

1

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 7 hours ago

Seattle, WA, United States

Note that the genomic ranges in a wig file are just a tiling of the genome so you should take the score into account (and use them as weights) when computing the coverage:

x <- import("GSM945575_WIGfs_Mnase_DYAD_artRM_sh73_bins50_ArtefactParam_-14_Scaled.wig.gz")
y <- unlist(x, use.names=FALSE)
coverage(y, weight="score")  # use "score" metadata columns as weights

Otherwise the unweighted coverage you get is equal to 1 all over the genome which I guess is not what you are after.

H.

ADD COMMENT • link 10.9 years ago Hervé Pagès 16k

score 4 · Accepted Answer · 2015-01-30

4

Entering edit mode

Julian Gehring ★ 1.3k

@julian-gehring-5818

Last seen 6.6 years ago

The object of class 'SimpleGenomicRangesList' or 'GenomicRangesList' you get back is in the end a list of individual GRanges, with one entry for each chromosome.

The reason why you get this for your wig file is the following: Each chromosome is included in the file as a 'track'. One can argue if this a valid or reasonable thing to do. Here a way to solve this:

library(rtracklayer)
x = import("GSM945575_WIGfs_Mnase_el146_merged_bins50_ArtefactParam_-14_Scaled.wig.gz")
## x is a GenomicRangesList
y = unlist(unname(x))
## collapse the list, which yields y
your_cov = coverage(y) ## works now

To elaborate a bit why importing this file gives you the 'GenomicRangesList': Different tracks separate different data sources, which means that the same genomic range is allowed to appear again in different tracks, e.g. with different scores. A way to avoid this is to separate the tracks, by returning them as entries of a list.

Knowing that the tracks are causing us some problems, we can alternatively ignore them in the input. This always gives us a 'GRanges' object, as expected:

x2 = import(file, trackLine = FALSE) ## returns a 'GRanges' object
your_cov = coverage(x2) ## works now

ADD COMMENT • link 10.9 years ago Julian Gehring ★ 1.3k

0

Entering edit mode

Thanks, this works!

I had tried unlist(x, use.names=FALSE) to no success - I would have expected this to work the same as unname, so I'm surprised by this solution.

ADD REPLY • link 10.9 years ago liz.ingsimmons ▴ 140

1

Entering edit mode

You don't need the unnaming at all for computing the coverage. It is simply for convenience to remove the names of tracks which are not informative in this case anyway.

y1 = unlist(unname(x))
y2 = unlist(x, use.names=FALSE)
identical(y1, y2) ## TRUE

are equivalent. What do you mean when you say that you have tried it without success?

ADD REPLY • link 10.9 years ago Julian Gehring ★ 1.3k

0

Entering edit mode

You're right, they do give the same result. Not sure where I got confused. Thanks!

ADD REPLY • link 10.9 years ago liz.ingsimmons ▴ 140