I am trying to run gbCounts in Aspli. It appears to work through a few samples, but then I get an error message that I do not know how to address. I appreciate any insight. Here is the code I used:
> Dmel.6.38.TxDb=makeTxDbFromGFF(
+ file="dmel-all-r6.38.gtf",
+ format="gtf")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning messages:
1: In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type stop_codon. This information was
ignored.
2: In makeTxDbFromGRanges(gr, metadata = metadata) :
The following transcripts were dropped because their exon ranks could not be inferred (either because the
exons are not on the same chromosome/strand or because they are not separated by introns): FBtr0084079,
FBtr0084080, FBtr0084081, FBtr0084082, FBtr0084083, FBtr0084084, FBtr0084085, FBtr0307759, FBtr0307760
3: In .reject_transcripts(bad_tx, because) :
The following transcripts were rejected because they have stop codons that cannot be mapped to an exon:
FBtr0100857, FBtr0100863, FBtr0433500, FBtr0433501
> saveDb(Dmel.6.38.TxDb,file="Dmel.6.38.TxDb.sqlite")
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: dmel-all-r6.38.gtf
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 35367
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2021-03-18 12:37:55 -0600 (Thu, 18 Mar 2021)
# GenomicFeatures version at creation time: 1.42.2
# RSQLite version at creation time: 2.2.4
# DBSCHEMAVERSION: 1.2
> features <- binGenome( Dmel.6.38.TxDb )
* Number of extracted Genes = 17869
* Number of extracted Exon Bins = 80637
* Number of extracted intron bins = 72288
* Number of extracted trascripts = 35367
* Number of extracted junctions = 60431
* Number of AS bins (not include external) = 9547
* Number of AS bins (include external) = 9557
* Classified as:
ES bins = 2427 (25%)
IR bins = 1257 (13%)
Alt5'ss bins = 1497 (16%)
Alt3'ss bins = 1622 (17%)
Multiple AS bins = 2744 (29%)
classified as:
ES bins = 531 (19%)
IR bins = 492 (18%)
Alt5'ss bins = 885 (32%)
Alt3'ss bins = 725 (26%)
> targets=read.csv("Targets.csv")
> getConditions(targets)
[1] "Mutant_F_1D" "Mutant_M_1D" "Mutant_F_28D" "Mutant_M_28D" "Control_F_1D" "Control_M_1D" "Control_F_28D"
[8] "Control_M_28D"
> gbcounts <- gbCounts( features = features,
+ targets = targets,
+ minReadLength = 100, maxISize = 50000,
+ strandMode=0)
Summarizing Mutant_F_1D_1
ETA: 53 min
Summarizing Mutant_F_1D_2
ETA: 49 min
Summarizing Mutant_F_1D_3
ETA: 46 min
Summarizing Mutant_M_1D_1
ETA: 43 min
Summarizing Mutant_M_1D_2
ETA: 41 min
Summarizing Mutant_M_1D_3
ETA: 38 min
Summarizing Mutant_F_28D_1
[1] 7
Error in .subset(x, j) : only 0's may be mixed with negative subscripts
In addition: Warning message:
In colnames(counts@junction.counts)[9:ncol(counts@junction.counts)] <- rownames(targets) :
number of items to replace is not a multiple of replacement length**
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.42.2 GenomicRanges_1.42.0 GenomeInfoDb_1.26.4 ASpli_2.0.0 AnnotationDbi_1.52.0
[6] IRanges_2.24.1 S4Vectors_0.28.1 Biobase_2.50.0 BiocGenerics_0.36.0 edgeR_3.32.1
[11] limma_3.46.0
loaded via a namespace (and not attached):
[1] colorspace_2.0-0 ellipsis_0.3.1 biovizBase_1.38.0 htmlTable_2.1.0
[5] XVector_0.30.0 base64enc_0.1-3 dichromat_2.0-0 rstudioapi_0.13
[9] DT_0.17 bit64_4.0.5 fansi_0.4.2 xml2_1.3.2
[13] splines_4.0.4 cachem_1.0.4 knitr_1.31 Formula_1.2-4
[17] Rsamtools_2.6.0 cluster_2.1.1 dbplyr_2.1.0 png_0.1-7
[21] BiocManager_1.30.10 compiler_4.0.4 httr_1.4.2 backports_1.2.1
[25] lazyeval_0.2.2 assertthat_0.2.1 Matrix_1.3-2 fastmap_1.1.0
[29] htmltools_0.5.1.1 prettyunits_1.1.1 tools_4.0.4 igraph_1.2.6
[33] gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.4 dplyr_1.0.5
[37] rappdirs_0.3.3 tinytex_0.30 Rcpp_1.0.6 vctrs_0.3.6
[41] Biostrings_2.58.0 rtracklayer_1.50.0 xfun_0.22 stringr_1.4.0
[45] lifecycle_1.0.0 ensembldb_2.14.0 XML_3.99-0.6 zlibbioc_1.36.0
[49] MASS_7.3-53.1 scales_1.1.1 BiocStyle_2.18.1 BSgenome_1.58.0
[53] VariantAnnotation_1.36.0 ProtGenerics_1.22.0 hms_1.0.0 MatrixGenerics_1.2.1
[57] SummarizedExperiment_1.20.0 AnnotationFilter_1.14.0 RColorBrewer_1.1-2 yaml_2.2.1
[61] curl_4.3 memoise_2.0.0 gridExtra_2.3 ggplot2_3.3.3
[65] UpSetR_1.4.0 biomaRt_2.46.3 rpart_4.1-15 latticeExtra_0.6-29
[69] stringi_1.5.3 RSQLite_2.2.4 checkmate_2.0.0 BiocParallel_1.24.1
[73] rlang_0.4.10 pkgconfig_2.0.3 matrixStats_0.58.0 bitops_1.0-6
[77] evaluate_0.14 lattice_0.20-41 purrr_0.3.4 htmlwidgets_1.5.3
[81] GenomicAlignments_1.26.0 bit_4.0.4 tidyselect_1.1.0 plyr_1.8.6
[85] magrittr_2.0.1 R6_2.5.0 generics_0.1.0 Hmisc_4.5-0
[89] DelayedArray_0.16.2 DBI_1.1.1 pillar_1.5.1 foreign_0.8-81
[93] survival_3.2-10 RCurl_1.98-1.3 nnet_7.3-15 tibble_3.1.0
[97] crayon_1.4.1 utf8_1.2.1 BiocFileCache_1.14.0 rmarkdown_2.7
[101] jpeg_0.1-8.1 progress_1.2.2 locfit_1.5-9.4 grid_4.0.4
[105] data.table_1.14.0 blob_1.2.1 digest_0.6.27 tidyr_1.1.3
[109] openssl_1.4.3 munsell_0.5.0 Gviz_1.34.1 askpass_1.1
Hi Ariel,
Thanks for the advice! I took out those genes as suggested. Some of the warning messages from 'makeTxDbFromGFF' went away, but I still got the same error message when running gbCounts: Error in .subset(x, j) : only 0's may be mixed with negative subscripts In addition: Warning message: In colnames(counts@junction.counts)[9:ncol(counts@junction.counts)] <- rownames(targets) : number of items to replace is not a multiple of replacement length.
I've listed the code below:
Hi Jeremy, It seems that there could be some problems with junction information in Mutant_F_28D_1 bam file. Is this file different from the rest from a QC point of view? Could you eventually share it with me in order to see what´s going on specifically with that file? Ariel
Hi Ariel,
I also remade the bam file just in case but still got the error. I don't think there is a big difference from a QC point of view and the file has worked for a few other workflows. I can send you the file over email unless there is a way to do it here. Thanks for your help!
Jeremy
Hi Jeremy,
I could not reproduce the error you reported with the bam you sent me....could you run the code I used and tell me how it goes?
Hi Ariel,
That also worked fine for me (see below). Just to be sure I also reran the original analysis and got the same error. I also tried the test below with the next bam file in the sequence just in case that was actually causing the problem, but that worked fine as well.
Could you send me your Targets.csv file?
Sorry, I forgot to respond here that I sent the file via email. Please let me know if you didn't receive it, thanks!
Hi Jeremy. We got it and we are working on the issue. Thanks for your patience. A