Question: Is there an easy way to extend coverage to seqlengths and fill with zeroes?
0
3.7 years ago by
Germany
liz.ingsimmons140 wrote:

Often I read in coverage from a file and the lengths of the coverage object are less than the seqlengths of the chromosomes. If I try to subset the coverage towards the end of the chromosome (say, with a GRanges), I get "subscript contains out-of-bounds ranges" errors.

Is there a function to extend the ends of the coverage with zeroes to the full length of the chromosome according to the seqlengths? This would be really useful. I think it's a fair assumption that the coverage should be zero if there was no coverage of that region in the file/object the coverage object was created from (this is after all the assumption used for the *start* of the chromosome). Padding the end of the chromosome with zeroes could be an option for the coverage function when seqlengths are known. Edit: this already exists in the 'width' argument to coverage, see Aaron's answer below!

I've written my own simple function to pad the ends of the coverage Rle for each chromosome, but I'm sure it can be improved.

pad_coverage <- function(cov){
newcov <- mapply(cov, pad, FUN = function(s, p){
Rle(values = c(runValue(s),0),
lengths = c(runLength(s), p))
})
S4Vectors:::new_SimpleList_from_list("SimpleRleList", newcov)
}
coverage seqlengths s4vectors • 666 views
modified 3.7 years ago • written 3.7 years ago by liz.ingsimmons140
Answer: Is there an easy way to extend coverage to seqlengths and fill with zeroes?
1
3.7 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Isn't this the default when you call coverage on a Granges object with the required sequence lengths in the seqinfo slot? For example, if I were to run:

a <- GRanges("chrA", IRanges(5, 10), seqinfo=Seqinfo("chrA", 20))
coverage(a)

... it would give me something like:

RleList of length 1
\$chrA
integer-Rle of length 20 with 3 runs
Lengths:  4  6 10
Values :  0  1  0

So you can see that it goes to the full length of chromosome A. Examination of the help page indicates that the width argument (that determines the total length of the Rle for each chromosome), if NULL, is replaced with seqlengths(x).

However, I still think it'd be useful to be able to extend / pad the coverage of an existing coverage object.

In the case where you have existing coverage, you can coerce it back to a GRanges and then call coverage again with the desired width argument (or set the seqlengths of the GRanges first):

gr <- as(cov, "GRanges")
new_cv <- coverage(gr, weight="score", width=your_seqinfo)