import wig to coverage
2
2
Entering edit mode
@lizingsimmons-6935
Last seen 3.5 years ago
Germany

Hi,

I'm trying to import a wig.gz file and turn it into genomic coverage. I expected this to work but it didn't:

tmp <-import.wig("/GSM945575_WIGfs_Mnase_DYAD_artRM_sh73_bins50_ArtefactParam_-14_Scaled.wig.gz")
coverage(tmp)
## Error in (function (classes, fdef, mtable)  :
##  unable to find an inherited method for function ‘coverage’ for signature ‘"SimpleGenomicRangesList"’

Investigating this a bit further, I found that the class of 'tmp' is indeed "SimpleGenomicRangesList", and the class of each element of this list is 'UCSCData"

class(tmp)
## [1] "SimpleGenomicRangesList"
## attr(,"package")
## [1] "GenomicRanges"
class(tmp[[1]])
## [1] "UCSCData"
## attr(,"package")
## [1] "rtracklayer"

The documentation for import.wig says it should produce "a ‘GRanges’ (or ‘RangedData’ if ‘asRangedData’ is ‘TRUE’)", so I'm not sure why my result is a SimpleGenomicRangesList.

I'm not able to coerce the tmp object to a GRangesList with e.g. GRangesList, or the UCSCData objects with GRanges (or as(x, "GRanges")).

Can anyone suggest how I can convert this into a coverage object? The wig.gz file is available here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM945575

sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] rtracklayer_1.26.2   GenomicRanges_1.18.4 GenomeInfoDb_1.2.4  
[4] IRanges_2.0.1        S4Vectors_0.4.0      BiocGenerics_0.12.1
[7] knitr_1.9           

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.5           BBmisc_1.8             
 [4] BiocParallel_1.0.1      Biostrings_2.34.1       bitops_1.0-6           
 [7] brew_1.0-6              checkmate_1.5.1         codetools_0.2-10       
[10] DBI_0.3.1               digest_0.6.8            evaluate_0.5.5         
[13] fail_1.2                foreach_1.4.2           formatR_1.0            
[16] GenomicAlignments_1.2.1 iterators_1.0.7         markdown_0.7.4         
[19] RCurl_1.95-4.5          Rsamtools_1.18.2        RSQLite_1.0.0          
[22] sendmailR_1.2-1         stringr_0.6.2           tools_3.1.1            
[25] XML_3.98-1.1            XVector_0.6.0           zlibbioc_1.12.0  
rtracklayer granges import.wig • 2.6k views
ADD COMMENT
4
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.0 years ago

The object of class  'SimpleGenomicRangesList' or 'GenomicRangesList' you get back is in the end a list of individual GRanges, with one entry for each chromosome.

The reason why you get this for your wig file is the following: Each chromosome is included in the file as a 'track'. One can argue if this a valid or reasonable thing to do. Here a way to solve this:

library(rtracklayer)
x = import("GSM945575_WIGfs_Mnase_el146_merged_bins50_ArtefactParam_-14_Scaled.wig.gz")
## x is a GenomicRangesList
y = unlist(unname(x))
## collapse the list, which yields y
your_cov = coverage(y) ## works now

To elaborate a bit why importing this file gives you the 'GenomicRangesList': Different tracks separate different data sources, which means that the same genomic range is allowed to appear again in different tracks, e.g. with different scores. A way to avoid this is to separate the tracks, by returning them as entries of a list.

Knowing that the tracks are causing us some problems, we can alternatively ignore them in the input. This always gives us a 'GRanges' object, as expected:

x2 = import(file, trackLine = FALSE) ## returns a 'GRanges' object
your_cov = coverage(x2) ## works now
ADD COMMENT
0
Entering edit mode

Thanks, this works!

I had tried unlist(x, use.names=FALSE) to no success - I would have expected this to work the same as unname, so I'm surprised by this solution.

ADD REPLY
1
Entering edit mode

You don't need the unnaming at all for computing the coverage. It is simply for convenience to remove the names of tracks which are not informative in this case anyway.

y1 = unlist(unname(x))
y2 = unlist(x, use.names=FALSE)
identical(y1, y2) ## TRUE

are equivalent. What do you mean when you say that you have tried it without success?

ADD REPLY
0
Entering edit mode

You're right, they do give the same result. Not sure where I got confused. Thanks!

ADD REPLY
1
Entering edit mode
@herve-pages-1542
Last seen 15 hours ago
Seattle, WA, United States

Note that the genomic ranges in a wig file are just a tiling of the genome so you should take the score into account  (and use them as weights) when computing the coverage:

x <- import("GSM945575_WIGfs_Mnase_DYAD_artRM_sh73_bins50_ArtefactParam_-14_Scaled.wig.gz")
y <- unlist(x, use.names=FALSE)
coverage(y, weight="score")  # use "score" metadata columns as weights

Otherwise the unweighted coverage you get is equal to 1 all over the genome which I guess is not what you are after.

H.

ADD COMMENT

Login before adding your answer.

Traffic: 1063 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6