Using recount data to calculate Intron retention
1
0
Entering edit mode
@alt_spliced
Last seen 5 months ago
Germany/Mainz/IMB

I want to calculate something like a Intron retention ratio per sample using the data provided in recount : getting the junction quantifications for a certain region and dividing this value over the coverage (non split reads or total reads) of that region. I have thought on getting the junction counts with snapcount (I have done that for other projects before and it works very well). However I do not know how to get the coverage given that I read that recount2 has base pair information but not recount3 and it seems like it is not possible to just query a region to get the coverage across full projects like in snapcount.

I have thought that perhaps I should iteratively download all the projects in the big projects that I want to query (TCGA/GTEx) and then, in each one use recount::read_counts() , subset for the regions that I am interested on, and finally collapse the projects. With this I should then proceed to merge my region counts with the junction counts and at the level of samples calculate the intron ratios.

I wonder if there is an easy way to get the coverage and junctions to perform this calculation.

Also, if I use recount2 for the counts, can I use "tcgav2" and "gtexv2" for snapcount? I think that the sample ids would not be compatible but I am not sure.

snapcount recount3 recount • 238 views
0
Entering edit mode
Last seen 18 days ago
United States

Hi Mariela,

Thank you for using recount to access the recount2 project data, and for your interest in recount3 data (available through the recount3 R/Bioconductor package). Both recount2 and recount3 quantify expression at the base-pair level data using bigWig files. It can then be converted to read counts as described in the F1000Research _recountWorkflow_ paper. We are working on finalizing the recount3 paper and it'll be available soon as a pre-print. In the meantime, this is the recount3 project documentation website http://rna.recount.bio/.

Overall, recount and recount3 allow access to study level data while snapcount allows access to the same data via queries. With either recount (for recount2 data) or recount3 (for recount3 data), you can download RangedSummarizedExperiment objects with the exon-exon junction counts for a given study. You can use tools like megadepth (also an R/Bioconductor package) to quantify expression over a set of genomic regions using the bigWig files in recount2/3 if needed (sum of base-pair coverage for each region).

Unless I'm missing a function from snapcount, I don't think that we have an easy function for computing intron retention. I'll let the snapcount maintainers respond your last question.

Best, Leo