I want to analyze the inclusion levels of a specific cassette exon in TCGA-BRCA Primary Tumor samples. In the end I have three splice junctions (Inclusion junction1 and 2 and the Skipping junction) for which I want to have the support / read counts in each sample. For this task I used two different packages:
- The snapcount package that belongs to the recount2 project
- The recount3 package that belong to the recount3 project
For both packages I queried the RangedSummarizedExperiment object containing the junctions for the TCGA-BRCA cohort, followed by a subsetting to the Primary Tumor samples (columns) and a subsetting to the 3 splice junctions (rows). Afterwards, I extracted the counts.
When comparing the counts of the 3 junctions for the same sample between snapcount and recount3 I observed that the Inclusion junction1 and 2 counts are in agreement between snapcount and recount3. However, the counts of the Skipping junction differed for all samples. Interestingly, the differences between snapcount and recount3 are not consistent across the samples. Sometimes the counts for recount3 are 8 times higher and sometimes only 2 times higher.
If I compute PSI values (Inclusion levels of the cassette exon) these of course also differ dramatically and are not perfectly correlated (R = 0.83). My overall aim is to use the PSI values to make survival analyses (KM plots and Cox regression). When checking the KM plots they differed dramatically as some patients switched groups (low to high or high to low).
My questions are:
1) Did anyone else encounter such differences in junction quantification?
2) Do the workflows / pipelines differ so much? If I remember correctly for recount3 STAR was used for alignments and for snapcount / the recount2 project some other aligner.
3) Which data should I trust more and is there a way to verify/justify my decision?
Thanks in advance and a nice weekend.
Here is an example of the first five samples with the corresponding counts of the junctions.
$sample1 snapcount recount3 IJ1 11 12 IJ2 57 60 SJ 5 39 $sample2 snapcount recount3 IJ1 15 16 IJ2 42 44 SJ 40 99 $sample3 snapcount recount3 IJ1 9 8 IJ2 32 33 SJ 7 32 $sample4 snapcount recount3 IJ1 6 6 IJ2 40 38 SJ 10 40 $sample5 snapcount recount3 IJ1 11 12 IJ2 50 49 SJ 9 29