Views with RleLists and IRangesLists
1
0
Entering edit mode
ccanchaya • 0
@ccanchaya-10889
Last seen 5.4 years ago

Hi,

I am trying to have the read coverage on transcripts of a non-model genome.  I created an GenomicAlignment object from the bam file and I have the annotations usin txdb object. I would like to use Views to extract the coverage over the regions I am interested.

reads<-readGAlignments("/Volumes/FT2/octopus/analyses/clean/17/17.order.RG.bam")
cvg <- coverage(reads)
txdb <- loadDb("17.sqlite", packageName=NA)
cvg_cds <- cdsBy(txdb, by="tx", use.names=FALSE)
cds_views<-Views(unlist(cvg),unlist(ranges(cvg_cds)))
slice(cds_views,lower=3)

However it takes ages to process this.  I think that unlisting the GenomicRangesList (cvg_cds) and unlisting Rlelist (cvg) would be the problem but it was the only way to work with the data.

Would there be a faster way to extract the read coverage in a genome over a list of ranges?

 

Best regards,

 

Carlos

 

 

 

 

 

 

 

 

 

Views genomicalignments IRangesList coverage RleList • 895 views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 4 hours ago
Seattle, WA, United States

Hi Carlos,

IMPORTANT: When you unlist cvg, you end up with a single Rle object representing the coverage along a single virtual sequence made of the concatenation of all the chromosomes in the genome. So the coordinate system you need to use to specify ranges along that single virtual sequence is not the same as before the unlisting. Before the unlisting, it was a local one i.e. 1 coordinate system per-chromosome. After the unlisting, it becomes a global one i.e. 1 single coordinate system along 1 single virtual sequence.

That means you can't just unlist the ranges in cvg_cds and use them as-is to define views on unlist(cvg). That gives you incorrect view coordinates. Instead you need to do something like:

cds_views <- Views(unlist(cvg), absoluteRanges(unlist(cvg_cds)))

See ?absoluteRanges for the details.

Now back to "how to extract the read coverage in a genome over a list of ranges"

Using views for this is not as convenient as it could be at the moment but we will try to improve the situation in the near future by implementing containers for genomic views (more details on the bioc-devel mailing list here https://stat.ethz.ch/pipermail/bioc-devel/2016-June/009433.html if you are curious about this).

Anyway, right now this can be done by just subsetting your RleList object (cvg in your case) with the GRanges object containing your ranges of interest:

cds <- unlist(cvg_cds, use.names=FALSE)

Note that before we can subset cvg with cds we need to drop the sequence levels (i.e. chromosomes) from the latter that are not represented in the former:

seqlevels(cds, force=TRUE) <- names(cvg)

Then we can subset:

cvg[cds]  # RleList object with 1 list element per range in cds

Last but not least: You might want to check coverageByTranscript() in GenomicFeatures. It might actually do exactly what you want.

Cheers,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 461 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6