Question

[maser] Why does maser coverage filtering ignore skipping counts of an alt. splicing analysis?

0

Entering edit mode

Anke • 0

@01d145f0

Last seen 13 months ago

Germany

Hi everyone,

I have a question regarding the coverage filter that can be applied to rMATS results (=alternative splicing analysis) using the function filterByCoverage() of the Bioconductor package maser. filterByCoverage() can be used to remove low coverage events. Per default, it removes splicing events with an average coverage of <= 5 reads. The average is calculated over all samples in the comparison. Before filtering, rMATS results are loaded into R using the function maser(). Two example skipping events (SE) as output by rMATS are as follows:

ID  GeneID  geneSymbol  chr strand  exonStart_0base exonEnd upstreamES  upstreamEE  downstreamES    downstreamEE    ID  IJC_SAMPLE_1    SJC_SAMPLE_1    IJC_SAMPLE_2    SJC_SAMPLE_2    IncFormLen  SkipFormLen PValue  FDR IncLevel1   IncLevel2   IncLevelDifference 
8025    "ENSG00000134759.14"    "ELP2"  chr18   +   36139418    36139613    36138269    36138426    36141136    36141201    8025    4,9,15  0,0,0   6,5,6   4,4,1   310 115 8.14875944499e-10   6.8150582549049824e-06  1.0,1.0,1.0 0.358,0.317,0.69    0.545
23616   "ENSG00000267750.6" "RUNDC3A-AS1"   chr17   -   44304980    44305043    44302516    44304207    44307291    44307500    23616   0,0,0   13,6,2  3,3,4   8,2,6   178 115 1.19718082014e-06   0.00020620498447021453  0.0,0.0,0.0 0.195,0.492,0.301   -0.329

Each group in the comparison has three replicates. The numbers of reads supporting inclusion (per sample) are given in columns IJC_SAMPLE_1 and IJC_SAMPLE_2 for groups 1 and 2, respectively. The numbers of reads supporting skipping are given in columns SJC_SAMPLE_1 and SJC_SAMPLE_2. While rMATS provides inclusion and skipping read counts per sample per event, maser() only reads the inclusion counts (columns IJC_SAMPLE_1 and IJC_SAMPLE_2) and ignores the skipping counts. As a consequence of not reading in the skipping counts, they are not used when events are filtered by coverage.

> library(maser)

# load rMATS results found in directory rmats_results_dir
> rmats <- maser(rmats_result_dir, cond_labels = c("group1","group2"), ftype = "JCEC")

# list all SE events (shows both events listed above)
> summary(rmats,type="SE")
     ID             GeneID  geneSymbol       PValue          FDR
1  8025 ENSG00000134759.14        ELP2 8.148759e-10 6.815058e-06
2 23616  ENSG00000267750.6 RUNDC3A-AS1 1.197181e-06 2.062050e-04
  IncLevelDifference PSI_1             PSI_2   Chr Strand       exon_target
1              0.545 1,1,1  0.358,0.317,0.69 chr18      + 36139419-36139613
2             -0.329 0,0,0 0.195,0.492,0.301 chr17      - 44304981-44305043
      exon_upstream   exon_downstream
1 36138270-36138426 36141137-36141201
2 44302517-44304207 44307292-44307500

# show counts available for SE events in rmats
# (lists only inclusion counts)
> slot(rmats, "SE_counts")
      group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
8025         4        9       15        6        5        6
23616        0        0        0        3        3        4

# filter events, remove events with average coverage <=5
> rmats_filt <- filterByCoverage(rmats,avg_reads=5)

# list all SE events remaining after filtering for coverage
> summary(rmats_filt,type="SE")
    ID             GeneID geneSymbol       PValue          FDR
1 8025 ENSG00000134759.14       ELP2 8.148759e-10 6.815058e-06
  IncLevelDifference PSI_1            PSI_2   Chr Strand       exon_target
1              0.545 1,1,1 0.358,0.317,0.69 chr18      + 36139419-36139613
      exon_upstream   exon_downstream
1 36138270-36138426 36141137-36141201

When applying the maser coverage filter, the first event above is kept, while the second one is removed. Had we instead filtered purely based on skipping coverage, the first events would have been removed, while the second one would have been kept.

So my question is: What is the reason to ignore the reads supporting the skipping of an exon when filtering events for a minimal average coverage? What makes reads supporting inclusion more important than reads supporting skipping? Using only inclusion coverage to filter low coverage events seems to bias the results.

I'm looking forward to hearing your thoughts. Thanks a lot and best,

Anke.

> sessionInfo( )
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] maser_1.20.0         GenomicRanges_1.54.1 GenomeInfoDb_1.38.2 
[4] IRanges_2.36.0       S4Vectors_0.40.2     BiocGenerics_0.48.1 
[7] ggplot2_3.4.4       

loaded via a namespace (and not attached):
  [1] DBI_1.1.3                   bitops_1.0-7               
  [3] deldir_2.0-2                gridExtra_2.3              
  [5] biomaRt_2.58.0              rlang_1.1.2                
  [7] magrittr_2.0.3              biovizBase_1.50.0          
  [9] matrixStats_1.2.0           compiler_4.3.2             
 [11] RSQLite_2.3.4               GenomicFeatures_1.54.1     
 [13] png_0.1-8                   vctrs_0.6.5                
 [15] ProtGenerics_1.34.0         stringr_1.5.1              
 [17] pkgconfig_2.0.3             crayon_1.5.2               
 [19] fastmap_1.1.1               backports_1.4.1            
 [21] dbplyr_2.4.0                XVector_0.42.0             
 [23] utf8_1.2.4                  Rsamtools_2.18.0           
 [25] rmarkdown_2.25              bit_4.0.5                  
 [27] xfun_0.41                   zlibbioc_1.48.0            
 [29] cachem_1.0.8                progress_1.2.3             
 [31] blob_1.2.4                  DelayedArray_0.28.0        
 [33] BiocParallel_1.36.0         jpeg_0.1-10                
 [35] parallel_4.3.2              prettyunits_1.2.0          
 [37] cluster_2.1.6               VariantAnnotation_1.48.1   
 [39] R6_2.5.1                    stringi_1.8.3              
 [41] RColorBrewer_1.1-3          rtracklayer_1.62.0         
 [43] rpart_4.1.23                Gviz_1.46.1                
 [45] Rcpp_1.0.11                 SummarizedExperiment_1.32.0
 [47] knitr_1.45                  base64enc_0.1-3            
 [49] Matrix_1.6-4                nnet_7.3-19                
 [51] tidyselect_1.2.0            dichromat_2.0-0.1          
 [53] rstudioapi_0.15.0           abind_1.4-5                
 [55] yaml_2.3.8                  codetools_0.2-19           
 [57] curl_5.2.0                  lattice_0.22-5             
 [59] tibble_3.2.1                Biobase_2.62.0             
 [61] withr_2.5.2                 KEGGREST_1.42.0            
 [63] evaluate_0.23               foreign_0.8-86             
 [65] BiocFileCache_2.10.1        xml2_1.3.6                 
 [67] Biostrings_2.70.1           pillar_1.9.0               
 [69] filelock_1.0.3              MatrixGenerics_1.14.0      
 [71] DT_0.31                     checkmate_2.3.1            
 [73] generics_0.1.3              RCurl_1.98-1.13            
 [75] ensembldb_2.26.0            hms_1.1.3                  
 [77] munsell_0.5.0               scales_1.3.0               
 [79] glue_1.6.2                  lazyeval_0.2.2             
 [81] Hmisc_5.1-1                 tools_4.3.2                
 [83] interp_1.1-5                BiocIO_1.12.0              
 [85] data.table_1.14.10          BSgenome_1.70.1            
 [87] GenomicAlignments_1.38.0    XML_3.99-0.16              
 [89] grid_4.3.2                  latticeExtra_0.6-30        
 [91] AnnotationDbi_1.64.1        colorspace_2.1-0           
 [93] GenomeInfoDbData_1.2.11     htmlTable_2.4.2            
 [95] restfulr_0.0.15             Formula_1.2-5              
 [97] cli_3.6.2                   rappdirs_0.3.3             
 [99] fansi_1.0.6                 S4Arrays_1.2.0             
[101] dplyr_1.1.4                 AnnotationFilter_1.26.0    
[103] gtable_0.3.4                digest_0.6.33              
[105] SparseArray_1.2.2           rjson_0.2.21               
[107] htmlwidgets_1.6.4           memoise_2.0.1              
[109] htmltools_0.5.7             lifecycle_1.0.4            
[111] httr_1.4.7                  bit64_4.0.5

rMATS alternativesplicing maser R • 470 views

ADD COMMENT • link 14 months ago Anke • 0