1
1
Entering edit mode
@likeeatingpizza-10790
Last seen 5.3 years ago

This question comes from a quiz exercise of the Coursera Bioconductor course. Since over there the class has been totally abandoned from the start by the tutors/mentors (supposing there are any) and apparently Mr Kasper Hansen has other priorities than checking the discussion board and helping his students, I have no other option than to post here.

I have a GRanges of H3K27me3 histone modification for the chr22 of H1 cell line that looks like this

> H1.chr22

GRanges object with 4068 ranges and 6 metadata columns:
seqnames               ranges strand   |        name     score signalValue
<Rle>            <IRanges>  <Rle>   | <character> <numeric>   <numeric>
[1]    chr22 [16554493, 16554606]      *   | Rank_121761        23     2.36555
[2]    chr22 [16867445, 16867558]      *   |  Rank_98479        35     3.13762
[3]    chr22 [17270830, 17270985]      *   |  Rank_98480        35     3.13762
[4]    chr22 [17275375, 17275501]      *   |  Rank_78285        43     3.55442
[5]    chr22 [17325983, 17326150]      *   |  Rank_84671        39     3.22958

I have retrieved a vector of fold-change enrichment of ChIP signal for the same chr and cell line, that looks like this:

> fc_signal
GRanges object with 341236 ranges and 1 metadata column:
seqnames               ranges strand   |            score
<Rle>            <IRanges>  <Rle>   |        <numeric>
[1]    chr22 [16554493, 16554502]      *   | 3.24963998794556
[2]    chr22 [16554503, 16554530]      *   | 4.06205987930298
[3]    chr22 [16554531, 16554542]      *   | 3.24963998794556
[4]    chr22 [16554543, 16554544]      *   | 2.04167008399963
[5]    chr22 [16554545, 16554565]      *   |          1.53125

As you can see the fc_signal ranges are much shorter (in width) but altogether they cover the same intervals as H1.chr22. Indeed their difference is zero

> setdiff(fc_signal, H1.chr22)
GRanges object with 0 ranges and 0 metadata columns:

I have to compute the average score in fc.signal for each region in the H1.chr22 GRanges. It was suggest to compute this using “Views”.

In other words, I need to find a way to "group" o "map" the fc_signal short ranges on the bigger ones, and getting the corresponding average value of their score. I have tried to do both subsetbyOverlaps or intersect, but other than the fact that I just get back the same query GRanges, I also loose the metadata values, so no chance to average them. I also don't see how could I create a Views with these two GRanges. Should I map them on the chr22 sequence (DNAString)?

I hope the question is clear

granges metadata coursera • 1.1k views
2
Entering edit mode
@michael-lawrence-3846
Last seen 7 weeks ago
United States

The fc_signal object is a run-length encoding of a vector using a GRanges. You can convert that to an Rle and then form views on that to aggregate:

rle <- coverage(fc_signal, weight="score")
viewMeans(Views(rle, H1.chr22))

As an alternative to using Views, you could use Lists:

mean(rle[H1.chr22])

0
Entering edit mode

THANK YOU Sir, that's exactly what I wanted to do. Should have thought of converting into an Rle... too many new different data classes/container in Bioconductor, I still have to get comfortable using them properly.

0
Entering edit mode

To be precise, to construct a Views on a GRanges this needs to be coerced as RangesList otherwise it will throw an error. This is the code that works:

vi <- Views(rle$chr22, as(H1.chr22, "RangesList")$chr22)

0
Entering edit mode

Thanks for catching that mistake, yes, you need to convert to RangesList, but you can do that across the entire genome:

vi <- Views(rle, as(H1.chr22, "RangesList"))

That returns a ViewsList, and stuff like viewMeans() works directly on those.

0
Entering edit mode

Wasn't there talk at some point about modifying Views to support GRanges? That would allow convenient things like this:

mcols(H1.chr22)\$mean_score <- viewMeans(Views(rle, H1.chr22))

Currently this is a bit of a pain because coercing a GRanges into a RangesList modifies the order.

0
Entering edit mode

If order is an issue, the List approach in the answer is probably the way to go for now.

0
Entering edit mode

Absolutely. Note that you can already do this (i.e. specify views with a GRanges) on a BSgenome object. But we should also be able to do this on a RleList object. I'm raising the priority for this in my TODO list. Thanks for the reminder.

H.