Hello everyone, I am trying to use ASpli for alternative splicing analysis. I have mouse data. I downloaded genome fasta and gtf file from Ensembl (GRCh39). I used hisat2 for alignment via the code:
hisat2 -p 12 -x GRCm39_ht_indexed -1 SRR6376403_1.fastq -2 SRR6376403_2.fastq -S SRR6376403.sam
and I sorted and indexed the SAM via samtools and merged the BAM files of each condition to a single BAM file.
Code should be placed in three backticks as shown below
library(ASpli)
library(GenomicFeatures)
genomeTxDb <- makeTxDbFromGFF("GRCm39.105.gtf")
features <- binGenome(genomeTxDb)
load("GRCm38_gtf_data.RData")
BAMFiles <- list.files(path="/home/huk/Documents/R/tjob_mm/BAM_Files" , pattern = ".bam" , full.names = TRUE)
targets <- data.frame(row.names = paste0('Sample',c(4:10)),
bam = BAMFiles[c(3,5,7,9,11,13,15)],
f1 = c('CBG','CBG','CBG','CBG','WBG','WBG','WBG'),
stringsAsFactors = FALSE)
mBAMs <- data.frame(bam = BAMFiles[c(1,27)],
condition = c("CBG","WBG"))
gbcounts <- gbCounts(features=features, targets=targets,
minReadLength = 30, maxISize = 50000 , libType = "PE")
asd <- jCounts(counts=gbcounts, features=features, minReadLength=30 , libType = "PE")
gb <- gbDUreport(gbcounts, contrast = c(-1,1))
jdur <- jDUreport(asd, contrast=c(-1,1))
sr <- splicingReport(gb, jdur, counts=gbcounts)
is <- integrateSignals(sr,asd)
exportIntegratedSignals(is,sr=sr,
output.dir = "mm_adipose",
counts=gbcounts,features=features,asd=asd,
mergedBams = mBAMs)
gbCounts function gives these warning but not an error:
Warning messages:
1: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, ... :
11329 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.
2: In FUN(X[[i]], ...) :
Some seqnames had a '.' present in their names. ASpli had to normalize them using '_'.
3: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, ... :
14593 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.
4: In FUN(X[[i]], ...) :
Some seqnames had a '.' present in their names. ASpli had to normalize them using '_'.
5: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, ... :
13303 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.
6: In FUN(X[[i]], ...) :
Some seqnames had a '.' present in their names. ASpli had to normalize them using '_'.
7: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, ... :
18998 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.
8: In FUN(X[[i]], ...) :
Some seqnames had a '.' present in their names. ASpli had to normalize them using '_'.
9: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, ... :
5345 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.
10: In FUN(X[[i]], ...) :
Some seqnames had a '.' present in their names. ASpli had to normalize them using '_'.
11: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, ... :
2538 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.
12: In FUN(X[[i]], ...) :
Some seqnames had a '.' present in their names. ASpli had to normalize them using '_'.
13: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, ... :
13423 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.
14: In FUN(X[[i]], ...) :
Some seqnames had a '.' present in their names. ASpli had to normalize them using '_'.
The error message I had in jCounts step:
Junctions PJU completed
Error in .local(x, use.names, use.mcols, ...) :
For some pairs in 'x', the 2 alignments are not on the same chromosome. Cannot associate a
unique genomic range to such pairs. Please call granges() with
'on.discordant.seqnames="drop"' to drop these pairs, or with 'on.discordant.seqnames="split"'
to represent each of them with 2 genomic ranges in the returned GRanges object. Note that in
both cases the returned object won't be parallel to 'x'. Alternatively, please consider using
grglist() instead of granges() to turn 'x' into a GRangesList object instead of a GRanges
object. See ?GAlignmentPairs for more information.
In addition: Warning messages:
1: In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode, :
11329 alignments with ambiguous pairing were dumped.
Use 'getDumpedAlignments()' to retrieve them from the dump environment.
2: In FUN(X[[i]], ...) :
Some seqnames had a '.' present in their names. ASpli had to normalize them using '_'.
When I change the parameter, libType to 'SE' it works but I am not sure it is the right thing to do. If someone gives me a hint, I would be so grateful. As I can see error suggests using 'on.discordant.seqnames="drop"' on granges() function but I am not also sure which step includes this parameter.
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=tr_TR.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=tr_TR.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=tr_TR.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=tr_TR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.38.2 GenomicRanges_1.38.0 GenomeInfoDb_1.22.1 ASpli_2.5.0
[5] AnnotationDbi_1.48.0 IRanges_2.20.2 S4Vectors_0.24.4 Biobase_2.46.0
[9] BiocGenerics_0.32.0 edgeR_3.28.1 limma_3.42.2
loaded via a namespace (and not attached):
[1] colorspace_2.0-2 ellipsis_0.3.2 biovizBase_1.34.1
[4] htmlTable_2.4.0 XVector_0.26.0 base64enc_0.1-3
[7] dichromat_2.0-0 rstudioapi_0.13 DT_0.20
[10] bit64_4.0.5 fansi_1.0.2 splines_3.6.3
[13] cachem_1.0.6 knitr_1.37 Formula_1.2-4
[16] Rsamtools_2.2.3 cluster_2.1.2 dbplyr_2.1.1
[19] png_0.1-7 BiocManager_1.30.16 compiler_3.6.3
[22] httr_1.4.2 backports_1.4.1 assertthat_0.2.1
[25] Matrix_1.4-0 fastmap_1.1.0 lazyeval_0.2.2
[28] cli_3.1.1 htmltools_0.5.2 prettyunits_1.1.1
[31] tools_3.6.3 igraph_1.2.11 gtable_0.3.0
[34] glue_1.6.1 GenomeInfoDbData_1.2.2 dplyr_1.0.8
[37] rappdirs_0.3.3 Rcpp_1.0.8 vctrs_0.3.8
[40] Biostrings_2.54.0 rtracklayer_1.46.0 xfun_0.29
[43] stringr_1.4.0 lifecycle_1.0.1 ensembldb_2.10.2
[46] XML_3.99-0.3 zlibbioc_1.32.0 MASS_7.3-55
[49] scales_1.1.1 BSgenome_1.54.0 BiocStyle_2.14.4
[52] VariantAnnotation_1.32.0 ProtGenerics_1.18.0 hms_1.1.1
[55] SummarizedExperiment_1.16.1 AnnotationFilter_1.10.0 RColorBrewer_1.1-2
[58] yaml_2.2.2 curl_4.3.2 memoise_2.0.1
[61] gridExtra_2.3 ggplot2_3.3.5 UpSetR_1.4.0
[64] biomaRt_2.42.1 rpart_4.1.16 latticeExtra_0.6-29
[67] stringi_1.7.6 RSQLite_2.2.9 checkmate_2.0.0
[70] BiocParallel_1.20.1 rlang_1.0.1 pkgconfig_2.0.3
[73] matrixStats_0.61.0 bitops_1.0-7 evaluate_0.14
[76] lattice_0.20-45 purrr_0.3.4 GenomicAlignments_1.22.1
[79] htmlwidgets_1.5.4 bit_4.0.4 tidyselect_1.1.1
[82] plyr_1.8.6 magrittr_2.0.2 R6_2.5.1
[85] generics_0.1.2 Hmisc_4.6-0 DelayedArray_0.12.3
[88] DBI_1.1.2 pillar_1.7.0 foreign_0.8-76
[91] survival_3.2-13 RCurl_1.98-1.5 nnet_7.3-17
[94] tibble_3.1.6 crayon_1.4.2 utf8_1.2.2
[97] BiocFileCache_1.10.2 rmarkdown_2.11 jpeg_0.1-9
[100] progress_1.2.2 locfit_1.5-9.4 grid_3.6.3
[103] data.table_1.14.2 blob_1.2.2 digest_0.6.29
[106] pbmcapply_1.5.0 tidyr_1.2.0 openssl_1.4.6
[109] munsell_0.5.0 Gviz_1.30.3 askpass_1.1
Thank you in advance