Question: Is there an easy way to extend coverage to seqlengths and fill with zeroes?
0
gravatar for liz.ingsimmons
3.7 years ago by
Germany
liz.ingsimmons140 wrote:

Often I read in coverage from a file and the lengths of the coverage object are less than the seqlengths of the chromosomes. If I try to subset the coverage towards the end of the chromosome (say, with a GRanges), I get "subscript contains out-of-bounds ranges" errors.

Is there a function to extend the ends of the coverage with zeroes to the full length of the chromosome according to the seqlengths? This would be really useful. I think it's a fair assumption that the coverage should be zero if there was no coverage of that region in the file/object the coverage object was created from (this is after all the assumption used for the *start* of the chromosome). Padding the end of the chromosome with zeroes could be an option for the coverage function when seqlengths are known. Edit: this already exists in the 'width' argument to coverage, see Aaron's answer below!

I've written my own simple function to pad the ends of the coverage Rle for each chromosome, but I'm sure it can be improved.

pad_coverage <- function(cov){
  pad <- seqlengths(cov)[seqlevels(cov)] - lengths(cov)
  newcov <- mapply(cov, pad, FUN = function(s, p){
    Rle(values = c(runValue(s),0),
             lengths = c(runLength(s), p))
  })
  S4Vectors:::new_SimpleList_from_list("SimpleRleList", newcov)
  }
coverage seqlengths s4vectors • 666 views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by liz.ingsimmons140
Answer: Is there an easy way to extend coverage to seqlengths and fill with zeroes?
1
gravatar for Aaron Lun
3.7 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Isn't this the default when you call coverage on a Granges object with the required sequence lengths in the seqinfo slot? For example, if I were to run:

a <- GRanges("chrA", IRanges(5, 10), seqinfo=Seqinfo("chrA", 20))
coverage(a)

... it would give me something like:

RleList of length 1
$chrA
integer-Rle of length 20 with 3 runs
  Lengths:  4  6 10
  Values :  0  1  0

So you can see that it goes to the full length of chromosome A. Examination of the help page indicates that the width argument (that determines the total length of the Rle for each chromosome), if NULL, is replaced with seqlengths(x).

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Aaron Lun25k

I had forgotten about the 'width' argument to coverage, thanks!

However, I still think it'd be useful to be able to extend / pad the coverage of an existing coverage object.

ADD REPLYlink written 3.7 years ago by liz.ingsimmons140

In the case where you have existing coverage, you can coerce it back to a GRanges and then call coverage again with the desired width argument (or set the seqlengths of the GRanges first):

gr <- as(cov, "GRanges")
new_cv <- coverage(gr, weight="score", width=your_seqinfo)

 

 

ADD REPLYlink written 3.7 years ago by Jeff Johnston90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 219 users visited in the last hour