Entering edit mode
I am trying to perform differential exon usage analysis using DEXSeq. I cannot create a DEXSeqDataset object using count files generated from dexseq_count.py and gff file generated from dexseq_prepare_annotation.py. I found other questions where the suggestion was to check if there was a discrepancy betweem the number of lines in count files and lines corresponding to 'exonic_part' in the gff file. I've checked this and found that all counts files match and correspond with the gff file. Can anyone offer a suggestion to fix this?
My code returns: Error in FUN(X[[i]], ...) : subscript out of bounds
#Output of list.files() in working directory
"3-31-22input_juncs_DI_pA_all_exons.filtered.gff" "KO_1_8.counts.txt" "KO_3_1.counts.txt" "KO_3_2.counts.txt" "KO_3_3.counts.txt" "WT_3_6.counts.txt" "WT_3_7.counts.txt" "WT_3_8.counts.txt"
countFiles <- list.files(pattern = '.counts', full.names=TRUE)
flattenedFile <- list.files(pattern-'.gff',full.names=TRUE)
sampleTable <- data.frame(row.names=c('KO_1_8','KO_3_1','KO_3_2','KO_3_3','WT_1_7','WT_3_6','WT_3_7','WT_3_8'), condition = c('KO','KO','KO','KO','WT','WT','WT','WT'),libType = c('paired-end','paired-end','paired-end','paired-end','paired-end','paired-end','paired-end','paired-end'))
dxd <- DEXSeqDataSetFromHTSeq(countFiles,
sampleData=sampleTable, design= ~sample+exon+condition:exon, flattenedfile=flattenedFile) #problematic line
#output of sessionInfo( ):
R version 4.1.3 (2022-03-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.3 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DEXSeq_1.40.0 RColorBrewer_1.1-3 [3] AnnotationDbi_1.56.2 DESeq2_1.34.0 [5] SummarizedExperiment_1.24.0 GenomicRanges_1.46.1 [7] GenomeInfoDb_1.30.1 IRanges_2.28.0 [9] S4Vectors_0.32.4 MatrixGenerics_1.6.0 [11] matrixStats_0.61.0 Biobase_2.54.0 [13] BiocGenerics_0.40.0 BiocParallel_1.28.3 loaded via a namespace (and not attached): [1] httr_1.4.2 bit64_4.0.5 splines_4.1.3 [4] assertthat_0.2.1 statmod_1.4.36 BiocFileCache_2.2.1 [7] blob_1.2.2 Rsamtools_2.10.0 GenomeInfoDbData_1.2.7 [10] progress_1.2.2 pillar_1.7.0 RSQLite_2.2.12 [13] lattice_0.20-45 glue_1.6.2 digest_0.6.29 [16] XVector_0.34.0 colorspace_2.0-3 Matrix_1.4-1 [19] XML_3.99-0.9 pkgconfig_2.0.3 biomaRt_2.50.3 [22] genefilter_1.76.0 zlibbioc_1.40.0 purrr_0.3.4 [25] xtable_1.8-4 scales_1.1.1 tibble_3.1.6 [28] annotate_1.72.0 KEGGREST_1.34.0 generics_0.1.2 [31] ggplot2_3.3.5 ellipsis_0.3.2 cachem_1.0.6 [34] cli_3.2.0 survival_3.3-1 magrittr_2.0.3 [37] crayon_1.5.1 memoise_2.0.1 fansi_1.0.3 [40] xml2_1.3.3 hwriter_1.3.2 tools_4.1.3 [43] prettyunits_1.1.1 hms_1.1.1 lifecycle_1.0.1 [46] stringr_1.4.0 munsell_0.5.0 locfit_1.5-9.5 [49] DelayedArray_0.20.0 Biostrings_2.62.0 compiler_4.1.3 [52] rlang_1.0.2 grid_4.1.3 RCurl_1.98-1.6 [55] rappdirs_0.3.3 bitops_1.0-7 gtable_0.3.0 [58] curl_4.3.2 DBI_1.1.2 R6_2.5.1 [61] dplyr_1.0.8 fastmap_1.1.0 bit_4.0.4 [64] utf8_1.2.2 filelock_1.0.2 stringi_1.7.6 [67] parallel_4.1.3 Rcpp_1.0.8.3 vctrs_0.4.0 [70] geneplotter_1.72.0 png_0.1-7 dbplyr_2.1.1 [73] tidyselect_1.1.2
Hello! I had the same issue it was answered. Its the quotes made by newer versions of HTSeq. See Arthur's answer here: DEXSeq errors "Error in scan( line ... did not have 3 elements" and "Error in FUN(X[[i]], ...) : subscript out of bounds" (DEXSeqDataSetFromHTSeq) Good luck!!
Thank you so much! Sorry for the late response, was pulled away by some other work.