Alternative splicing analysis in edgeR: duplicate exons
1
0
Entering edit mode
jbono ▴ 10
@jbono-7682
Last seen 3 months ago
United States

Hi,

I am attempting to examine alternative splicing with edgeR. I have moved through the analysis without obvious problems. However, when I investigate individual genes that show up as alternatively spliced (e.g. plotSpliceDGE) I notice that often what appears to be the same exon with the same start coordinate is repeated multiple times (see attached image for exampleenter image description here). It doesn't happen for every gene, but spot checking suggests that it happens to a lot of them. I have also noticed that sometimes none of the exons are coded in red, even thought the gene supposedly has alternatively spliced exons. Any insight on why these issues occur would be greatly appreciated, thank you! I've included code below (including what I did in Rsubread because I'm not sure exactly where the problem is originating.

subjunc(index="mel_index",readfile1=reads1_2, readfile2=reads2_2, nthreads=8)

bamlist_splicing=list.files(pattern=".subjunc.sorted.BAM$")
counts_splicing <- featureCounts(bamlist_splicing, annot.ext="dmel-all-r6.38.gtf",
isGTFAnnotationFile=TRUE, GTF.featureType="exon", 
useMetaFeatures=FALSE, allowMultiOverlap=TRUE, isPairedEnd=TRUE)

Counts_splicing_DGE <- DGEList(counts_splicing$counts, genes=counts_splicing$annotation, group=groups)
filter_keep_splicing <- filterByExpr(Counts_splicing_DGE)
Counts_splicing_DGE_filtered <- Counts_splicing_DGE[filter_keep_splicing,]
Counts_splicing_DGE_filtered_norm_factors <- calcNormFactors(Counts_splicing_DGE_filtered)

design <- model.matrix(~0+Group)
colnames(design) <- levels(Group)
Counts_splicing_DGE_filtered_norm_factors <- estimateDisp(Counts_splicing_DGE_filtered_norm_factors, design)
fit_splicing <- glmQLFit(Counts_splicing_DGE_filtered_norm_factors, design)

ControlvsMutant.F.1.splicing <- diffSpliceDGE(fit_splicing, geneid="GeneID", exonid="Start", contrast=contrasts[,"Y.F.1vsC.F.1"])
ControlvsMutant.F.1.splicing.results=topSpliceDGE(ControlvsMutant.F.1.splicing, test="Simes", n=Inf, FDR=1)

plotSpliceDGE(ControlvsMutant.F.1.splicing, geneid="FBgn0033749")

# include your problematic code here with any corresponding output 

# please also include the results of running the following in an R session 

sessionInfo( )
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Rsubread_2.4.3         GenomicFeatures_1.42.2 GenomicRanges_1.42.0   GenomeInfoDb_1.26.4    ASpli_2.0.0           
 [6] AnnotationDbi_1.52.0   IRanges_2.24.1         S4Vectors_0.28.1       Biobase_2.50.0         BiocGenerics_0.36.0   
[11] edgeR_3.32.1           limma_3.46.0          

loaded via a namespace (and not attached):
  [1] colorspace_2.0-0            ellipsis_0.3.1              biovizBase_1.38.0           htmlTable_2.1.0            
  [5] XVector_0.30.0              base64enc_0.1-3             dichromat_2.0-0             rstudioapi_0.13            
  [9] DT_0.17                     bit64_4.0.5                 fansi_0.4.2                 xml2_1.3.2                 
 [13] splines_4.0.4               cachem_1.0.4                knitr_1.31                  Formula_1.2-4              
 [17] Rsamtools_2.6.0             cluster_2.1.1               dbplyr_2.1.0                png_0.1-7                  
 [21] BiocManager_1.30.10         compiler_4.0.4              httr_1.4.2                  backports_1.2.1            
 [25] lazyeval_0.2.2              assertthat_0.2.1            Matrix_1.3-2                fastmap_1.1.0              
 [29] htmltools_0.5.1.1           prettyunits_1.1.1           tools_4.0.4                 igraph_1.2.6               
 [33] gtable_0.3.0                glue_1.4.2                  GenomeInfoDbData_1.2.4      dplyr_1.0.5                
 [37] rappdirs_0.3.3              tinytex_0.30                Rcpp_1.0.6                  vctrs_0.3.6                
 [41] Biostrings_2.58.0           rtracklayer_1.50.0          xfun_0.22                   stringr_1.4.0              
 [45] lifecycle_1.0.0             ensembldb_2.14.0            statmod_1.4.35              XML_3.99-0.6               
 [49] zlibbioc_1.36.0             MASS_7.3-53.1               scales_1.1.1                BiocStyle_2.18.1           
 [53] BSgenome_1.58.0             VariantAnnotation_1.36.0    ProtGenerics_1.22.0         hms_1.0.0                  
 [57] MatrixGenerics_1.2.1        SummarizedExperiment_1.20.0 AnnotationFilter_1.14.0     RColorBrewer_1.1-2         
 [61] yaml_2.2.1                  curl_4.3                    memoise_2.0.0               gridExtra_2.3              
 [65] ggplot2_3.3.3               UpSetR_1.4.0                biomaRt_2.46.3              rpart_4.1-15               
 [69] latticeExtra_0.6-29         stringi_1.5.3               RSQLite_2.2.4               checkmate_2.0.0            
 [73] BiocParallel_1.24.1         rlang_0.4.10                pkgconfig_2.0.3             matrixStats_0.58.0         
 [77] bitops_1.0-6                evaluate_0.14               lattice_0.20-41             purrr_0.3.4                
 [81] htmlwidgets_1.5.3           GenomicAlignments_1.26.0    bit_4.0.4                   tidyselect_1.1.0           
 [85] plyr_1.8.6                  magrittr_2.0.1              R6_2.5.0                    generics_0.1.0             
 [89] Hmisc_4.5-0                 DelayedArray_0.16.2         DBI_1.1.1                   pillar_1.5.1               
 [93] foreign_0.8-81              survival_3.2-10             RCurl_1.98-1.3              nnet_7.3-15                
 [97] tibble_3.1.0                crayon_1.4.1                utf8_1.2.1                  BiocFileCache_1.14.0       
[101] rmarkdown_2.7               jpeg_0.1-8.1                progress_1.2.2              locfit_1.5-9.4             
[105] grid_4.0.4                  data.table_1.14.0           blob_1.2.1                  digest_0.6.27              
[109] tidyr_1.1.3                 openssl_1.4.3               munsell_0.5.0               Gviz_1.34.1                
[113] askpass_1.1
edgeR Rsubread • 1.2k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

edgeR isn't designed to deal with lots overlapping but different exons (or even worse the same exon with different names) of which the fly annotation contains many. So you need to flatten the GTF file with Rsubread::flattenGTF (with default settings) before running featureCounts. That will ensure non-overlapping exons and will greatly simplify the analysis.

You might also find

counts_splicing_DGE <- featureCounts2DGEList(fc)

useful where fc is the object from featureCounts. It assembles the featureCounts output into a DGEList.

ADD COMMENT
0
Entering edit mode

Hi Gordon,

Thanks so much!

ADD REPLY

Login before adding your answer.

Traffic: 829 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6