Question: import wig to coverage
2
4.8 years ago by
Germany
liz.ingsimmons140 wrote:

Hi,

I'm trying to import a wig.gz file and turn it into genomic coverage. I expected this to work but it didn't:

tmp <-import.wig("/GSM945575_WIGfs_Mnase_DYAD_artRM_sh73_bins50_ArtefactParam_-14_Scaled.wig.gz")
coverage(tmp)
## Error in (function (classes, fdef, mtable)  :
##  unable to find an inherited method for function ‘coverage’ for signature ‘"SimpleGenomicRangesList"’

Investigating this a bit further, I found that the class of 'tmp' is indeed "SimpleGenomicRangesList", and the class of each element of this list is 'UCSCData"

class(tmp)
## [1] "SimpleGenomicRangesList"
## attr(,"package")
## [1] "GenomicRanges"
class(tmp[[1]])
## [1] "UCSCData"
## attr(,"package")
## [1] "rtracklayer"

The documentation for import.wig says it should produce "a ‘GRanges’ (or ‘RangedData’ if ‘asRangedData’ is ‘TRUE’)", so I'm not sure why my result is a SimpleGenomicRangesList.

I'm not able to coerce the tmp object to a GRangesList with e.g. GRangesList, or the UCSCData objects with GRanges (or as(x, "GRanges")).

Can anyone suggest how I can convert this into a coverage object? The wig.gz file is available here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM945575

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] rtracklayer_1.26.2   GenomicRanges_1.18.4 GenomeInfoDb_1.2.4
[4] IRanges_2.0.1        S4Vectors_0.4.0      BiocGenerics_0.12.1
[7] knitr_1.9

loaded via a namespace (and not attached):
[1] base64enc_0.1-2         BatchJobs_1.5           BBmisc_1.8
[4] BiocParallel_1.0.1      Biostrings_2.34.1       bitops_1.0-6
[7] brew_1.0-6              checkmate_1.5.1         codetools_0.2-10
[10] DBI_0.3.1               digest_0.6.8            evaluate_0.5.5
[13] fail_1.2                foreach_1.4.2           formatR_1.0
[16] GenomicAlignments_1.2.1 iterators_1.0.7         markdown_0.7.4
[19] RCurl_1.95-4.5          Rsamtools_1.18.2        RSQLite_1.0.0
[22] sendmailR_1.2-1         stringr_0.6.2           tools_3.1.1
[25] XML_3.98-1.1            XVector_0.6.0           zlibbioc_1.12.0  
rtracklayer granges import.wig • 1.6k views
modified 4.8 years ago by Hervé Pagès ♦♦ 14k • written 4.8 years ago by liz.ingsimmons140
4
4.8 years ago by
Julian Gehring1.3k
Julian Gehring1.3k wrote:

The object of class  'SimpleGenomicRangesList' or 'GenomicRangesList' you get back is in the end a list of individual GRanges, with one entry for each chromosome.

The reason why you get this for your wig file is the following: Each chromosome is included in the file as a 'track'. One can argue if this a valid or reasonable thing to do. Here a way to solve this:

library(rtracklayer)
x = import("GSM945575_WIGfs_Mnase_el146_merged_bins50_ArtefactParam_-14_Scaled.wig.gz")
## x is a GenomicRangesList
y = unlist(unname(x))
## collapse the list, which yields y
your_cov = coverage(y) ## works now

To elaborate a bit why importing this file gives you the 'GenomicRangesList': Different tracks separate different data sources, which means that the same genomic range is allowed to appear again in different tracks, e.g. with different scores. A way to avoid this is to separate the tracks, by returning them as entries of a list.

Knowing that the tracks are causing us some problems, we can alternatively ignore them in the input. This always gives us a 'GRanges' object, as expected:

x2 = import(file, trackLine = FALSE) ## returns a 'GRanges' object
your_cov = coverage(x2) ## works now


Thanks, this works!

I had tried unlist(x, use.names=FALSE) to no success - I would have expected this to work the same as unname, so I'm surprised by this solution.

1

You don't need the unnaming at all for computing the coverage. It is simply for convenience to remove the names of tracks which are not informative in this case anyway.

y1 = unlist(unname(x))
y2 = unlist(x, use.names=FALSE)
identical(y1, y2) ## TRUE


are equivalent. What do you mean when you say that you have tried it without success?

You're right, they do give the same result. Not sure where I got confused. Thanks!

1
4.8 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Note that the genomic ranges in a wig file are just a tiling of the genome so you should take the score into account  (and use them as weights) when computing the coverage:

x <- import("GSM945575_WIGfs_Mnase_DYAD_artRM_sh73_bins50_ArtefactParam_-14_Scaled.wig.gz")
y <- unlist(x, use.names=FALSE)
coverage(y, weight="score")  # use "score" metadata columns as weights

Otherwise the unweighted coverage you get is equal to 1 all over the genome which I guess is not what you are after.

H.