Is there an easy way to extend coverage to seqlengths and fill with zeroes?
1
0
Entering edit mode
@lizingsimmons-6935
Last seen 3.4 years ago
Germany

Often I read in coverage from a file and the lengths of the coverage object are less than the seqlengths of the chromosomes. If I try to subset the coverage towards the end of the chromosome (say, with a GRanges), I get "subscript contains out-of-bounds ranges" errors.

Is there a function to extend the ends of the coverage with zeroes to the full length of the chromosome according to the seqlengths? This would be really useful. I think it's a fair assumption that the coverage should be zero if there was no coverage of that region in the file/object the coverage object was created from (this is after all the assumption used for the *start* of the chromosome). Padding the end of the chromosome with zeroes could be an option for the coverage function when seqlengths are known. Edit: this already exists in the 'width' argument to coverage, see Aaron's answer below!

I've written my own simple function to pad the ends of the coverage Rle for each chromosome, but I'm sure it can be improved.

pad_coverage <- function(cov){
  pad <- seqlengths(cov)[seqlevels(cov)] - lengths(cov)
  newcov <- mapply(cov, pad, FUN = function(s, p){
    Rle(values = c(runValue(s),0),
             lengths = c(runLength(s), p))
  })
  S4Vectors:::new_SimpleList_from_list("SimpleRleList", newcov)
  }
coverage s4vectors seqlengths • 1.5k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen just now
The city by the bay

Isn't this the default when you call coverage on a Granges object with the required sequence lengths in the seqinfo slot? For example, if I were to run:

a <- GRanges("chrA", IRanges(5, 10), seqinfo=Seqinfo("chrA", 20))
coverage(a)

... it would give me something like:

RleList of length 1
$chrA
integer-Rle of length 20 with 3 runs
  Lengths:  4  6 10
  Values :  0  1  0

So you can see that it goes to the full length of chromosome A. Examination of the help page indicates that the width argument (that determines the total length of the Rle for each chromosome), if NULL, is replaced with seqlengths(x).

ADD COMMENT
0
Entering edit mode

I had forgotten about the 'width' argument to coverage, thanks!

However, I still think it'd be useful to be able to extend / pad the coverage of an existing coverage object.

ADD REPLY
0
Entering edit mode

In the case where you have existing coverage, you can coerce it back to a GRanges and then call coverage again with the desired width argument (or set the seqlengths of the GRanges first):

gr <- as(cov, "GRanges")
new_cv <- coverage(gr, weight="score", width=your_seqinfo)

 

 

ADD REPLY

Login before adding your answer.

Traffic: 752 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6