Different read support for splice junctions using snapcount and recount3
1
0
Entering edit mode
@mariokeller1988-13652
Last seen 1 day ago
Germany

Hey everyone,

I want to analyze the inclusion levels of a specific cassette exon in TCGA-BRCA Primary Tumor samples. In the end I have three splice junctions (Inclusion junction1 and 2 and the Skipping junction) for which I want to have the support / read counts in each sample. For this task I used two different packages:

1. The snapcount package that belongs to the recount2 project
2. The recount3 package that belong to the recount3 project

For both packages I queried the RangedSummarizedExperiment object containing the junctions for the TCGA-BRCA cohort, followed by a subsetting to the Primary Tumor samples (columns) and a subsetting to the 3 splice junctions (rows). Afterwards, I extracted the counts.

When comparing the counts of the 3 junctions for the same sample between snapcount and recount3 I observed that the Inclusion junction1 and 2 counts are in agreement between snapcount and recount3. However, the counts of the Skipping junction differed for all samples. Interestingly, the differences between snapcount and recount3 are not consistent across the samples. Sometimes the counts for recount3 are 8 times higher and sometimes only 2 times higher.

If I compute PSI values (Inclusion levels of the cassette exon) these of course also differ dramatically and are not perfectly correlated (R = 0.83). My overall aim is to use the PSI values to make survival analyses (KM plots and Cox regression). When checking the KM plots they differed dramatically as some patients switched groups (low to high or high to low).

My questions are:

1) Did anyone else encounter such differences in junction quantification?

2) Do the workflows / pipelines differ so much? If I remember correctly for recount3 STAR was used for alignments and for snapcount / the recount2 project some other aligner.

3) Which data should I trust more and is there a way to verify/justify my decision?

Thanks in advance and a nice weekend.

Mario

Here is an example of the first five samples with the corresponding counts of the junctions.

$sample1 snapcount recount3 IJ1 11 12 IJ2 57 60 SJ 5 39$sample2
snapcount recount3
IJ1          15          16
IJ2          42          44
SJ           40          99

$sample3 snapcount recount3 IJ1 9 8 IJ2 32 33 SJ 7 32$sample4
snapcount recount3
IJ1           6           6
IJ2          40          38
SJ           10          40

\$sample5
snapcount recount3
IJ1          11          12
IJ2          50          49
SJ            9          29

recount3 snapcount • 119 views
0
Entering edit mode
Last seen 10 days ago
United States

Hi,

The aligners for recount2 and recount3 are different as you noted in your bullet point number 2. This can have a large difference as the one you observed in one junction.

If you have external information to trust one aligner over the other, you can use that information. Alternatively, you can do the analysis with the data from one version and then use the data from the other as a sensitivity analysis. We wrote some sections in the supplementary material of the recount3 paper comparing it with recount2, though well, it doesn't cover specific situations like the one you have presented.

Best, Leo