DEXSeq: Error in reading counts from htseq "Error in FUN(X[[i]], ...) : subscript out of bounds"
Entering edit mode
Delta._.43 ▴ 20
Last seen 8 months ago

Reading in data generated with using DEXSeqDataSetFromHTSeq() throws the following error:

Error in FUN(X[[i]], ...) : subscript out of bounds
 4. lapply(X = X, FUN = FUN, ...)
 3. sapply(splitted, "[[", 2)
 2. sapply(splitted, "[[", 2)
 1. DEXSeqDataSetFromHTSeq(countFiles, sampleData = sampleTable, 
    design = ~sample + exon + condition:exon, flattenedfile = flattenedFile)

Based on the traceback it leads back to the line where it reads the count files inside the function -

lf <- lapply(countfiles, function(x) read.table(x, header = FALSE, 
        stringsAsFactors = FALSE))

For some reason the default behaviour of read.table() splits the file into 3 columns rather than 2 as originally intended. Specifying the delimiter for the file explicitly solves the issue. i.e.

lf <- lapply(countfiles, function(x) read.table(x, header = FALSE, 
        stringsAsFactors = FALSE, sep = '\t'))

I'm not sure where to post this fix or if this is the best way to solve this issue. Maybe someone with more experience could provide a better solution.

dxd = DEXSeqDataSetFromHTSeq(
   design= ~ sample + exon + condition:exon,
   flattenedfile=flattenedFile )

$ Error in FUN(X[[i]], ...) : subscript out of bounds


$ 4: lapply(X = X, FUN = FUN, ...)
$ 3: sapply(splitted, "[[", 2)
$ 2: sapply(splitted, "[[", 2)
$ 1: DEXSeqDataSetFromHTSeq(countFiles, sampleData = sampleTable, 
       design = ~sample + exon + condition:exon, flattenedfile = flattenedFile)


R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

[1] LC_COLLATE=English_India.utf8  LC_CTYPE=English_India.utf8    LC_MONETARY=English_India.utf8 LC_NUMERIC=C                   LC_TIME=English_India.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DEXSeq_1.42.0               RColorBrewer_1.1-3          AnnotationDbi_1.58.0        DESeq2_1.36.0               SummarizedExperiment_1.26.1
 [6] GenomicRanges_1.48.0        GenomeInfoDb_1.32.2         IRanges_2.30.0              S4Vectors_0.34.0            MatrixGenerics_1.8.1       
[11] matrixStats_0.62.0          Biobase_2.56.0              BiocGenerics_0.42.0         BiocParallel_1.30.3         reshape2_1.4.4             
[16] kableExtra_1.3.4            forcats_0.5.1               stringr_1.4.0               dplyr_1.0.9                 purrr_0.3.4                
[21] readr_2.1.2                 tidyr_1.2.0                 tibble_3.1.8                ggplot2_3.3.6               tidyverse_1.3.2            
[26] data.table_1.14.2          

loaded via a namespace (and not attached):
 [1] googledrive_2.0.0      colorspace_2.0-3       hwriter_1.3.2.1        ellipsis_0.3.2         XVector_0.36.0         fs_1.5.2              
 [7] rstudioapi_0.13        bit64_4.0.5            fansi_1.0.3            lubridate_1.8.0        xml2_1.3.3             codetools_0.2-18      
[13] splines_4.2.1          cachem_1.0.6           geneplotter_1.74.0     knitr_1.39             jsonlite_1.8.0         Rsamtools_2.12.0      
[19] broom_1.0.0            annotate_1.74.0        dbplyr_2.2.1           png_0.1-7              compiler_4.2.1         httr_1.4.3            
[25] backports_1.4.1        assertthat_0.2.1       Matrix_1.4-1           fastmap_1.1.0          gargle_1.2.0           cli_3.3.0             
[31] prettyunits_1.1.1      htmltools_0.5.3        tools_4.2.1            gtable_0.3.0           glue_1.6.2             GenomeInfoDbData_1.2.8
[37] rappdirs_0.3.3         Rcpp_1.0.9             cellranger_1.1.0       vctrs_0.4.1            Biostrings_2.64.0      svglite_2.1.0         
[43] xfun_0.32              rvest_1.0.2            lifecycle_1.0.1        pacman_0.5.1           statmod_1.4.36         XML_3.99-0.10         
[49] googlesheets4_1.0.0    zlibbioc_1.42.0        scales_1.2.0           hms_1.1.1              parallel_4.2.1         curl_4.3.2            
[55] memoise_2.0.1          biomaRt_2.52.0         stringi_1.7.8          RSQLite_2.2.15         genefilter_1.78.0      filelock_1.0.2        
[61] rlang_1.0.4            pkgconfig_2.0.3        systemfonts_1.0.4      bitops_1.0-7           evaluate_0.16          lattice_0.20-45       
[67] bit_4.0.4              tidyselect_1.1.2       plyr_1.8.7             magrittr_2.0.3         R6_2.5.1               generics_0.1.3        
[73] DelayedArray_0.22.0    DBI_1.1.3              pillar_1.8.0           haven_2.5.0            withr_2.5.0            survival_3.4-0        
[79] KEGGREST_1.36.3        RCurl_1.98-1.8         modelr_0.1.8           crayon_1.5.1           utf8_1.2.2             BiocFileCache_2.4.0   
[85] tzdb_0.3.0             rmarkdown_2.14         progress_1.2.2         locfit_1.5-9.6         grid_4.2.1             readxl_1.4.0          
[91] blob_1.2.3             reprex_2.0.1           digest_0.6.29          webshot_0.5.3          xtable_1.8-4           munsell_0.5.0         
[97] viridisLite_0.4.0
DEXSeq • 550 views
Entering edit mode

Thanks for your detail post. I think this might be related to this issue here: DEXSeq errors "Error in scan( line ... did not have 3 elements" and "Error in FUN(X[[i]], ...) : subscript out of bounds" (DEXSeqDataSetFromHTSeq)

Just to verify, could you please post the first lines of your count files?

Entering edit mode

Yes, indeed it is the same error. Here are the first few lines of my count file.

"ENSG00000000003":"001" 0
"ENSG00000000003":"002" 1032
"ENSG00000000003":"003" 412
"ENSG00000000003":"004" 0
"ENSG00000000003":"005" 280
"ENSG00000000003":"006" 263
"ENSG00000000003":"007" 272
"ENSG00000000003":"008" 339
"ENSG00000000003":"009" 295

As already mentioned by Arthur in that thread, I also thought of modifying the count files to remove the quotes around the colon, and that also solves the issue. I tried -

sed -i 's/":"/:/1' countfile

Login before adding your answer.

Traffic: 448 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6