DEXSeq: Error in reading counts from htseq "Error in FUN(X[[i]], ...) : subscript out of bounds"
0
0
Entering edit mode
Delta._.43 ▴ 20
@a6fd600c
Last seen 21 months ago
Poland

Reading in data generated with dexseq_count.py using DEXSeqDataSetFromHTSeq() throws the following error:

Error in FUN(X[[i]], ...) : subscript out of bounds
 4. lapply(X = X, FUN = FUN, ...)
 3. sapply(splitted, "[[", 2)
 2. sapply(splitted, "[[", 2)
 1. DEXSeqDataSetFromHTSeq(countFiles, sampleData = sampleTable, 
    design = ~sample + exon + condition:exon, flattenedfile = flattenedFile)

Based on the traceback it leads back to the line where it reads the count files inside the function -

lf <- lapply(countfiles, function(x) read.table(x, header = FALSE, 
        stringsAsFactors = FALSE))

For some reason the default behaviour of read.table() splits the file into 3 columns rather than 2 as originally intended. Specifying the delimiter for the file explicitly solves the issue. i.e.

lf <- lapply(countfiles, function(x) read.table(x, header = FALSE, 
        stringsAsFactors = FALSE, sep = '\t'))

I'm not sure where to post this fix or if this is the best way to solve this issue. Maybe someone with more experience could provide a better solution.


dxd = DEXSeqDataSetFromHTSeq(
   countFiles,
   sampleData=sampleTable,
   design= ~ sample + exon + condition:exon,
   flattenedfile=flattenedFile )

$ Error in FUN(X[[i]], ...) : subscript out of bounds

traceback()

$ 4: lapply(X = X, FUN = FUN, ...)
$ 3: sapply(splitted, "[[", 2)
$ 2: sapply(splitted, "[[", 2)
$ 1: DEXSeqDataSetFromHTSeq(countFiles, sampleData = sampleTable, 
       design = ~sample + exon + condition:exon, flattenedfile = flattenedFile)


sessionInfo()

R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_India.utf8  LC_CTYPE=English_India.utf8    LC_MONETARY=English_India.utf8 LC_NUMERIC=C                   LC_TIME=English_India.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DEXSeq_1.42.0               RColorBrewer_1.1-3          AnnotationDbi_1.58.0        DESeq2_1.36.0               SummarizedExperiment_1.26.1
 [6] GenomicRanges_1.48.0        GenomeInfoDb_1.32.2         IRanges_2.30.0              S4Vectors_0.34.0            MatrixGenerics_1.8.1       
[11] matrixStats_0.62.0          Biobase_2.56.0              BiocGenerics_0.42.0         BiocParallel_1.30.3         reshape2_1.4.4             
[16] kableExtra_1.3.4            forcats_0.5.1               stringr_1.4.0               dplyr_1.0.9                 purrr_0.3.4                
[21] readr_2.1.2                 tidyr_1.2.0                 tibble_3.1.8                ggplot2_3.3.6               tidyverse_1.3.2            
[26] data.table_1.14.2          

loaded via a namespace (and not attached):
 [1] googledrive_2.0.0      colorspace_2.0-3       hwriter_1.3.2.1        ellipsis_0.3.2         XVector_0.36.0         fs_1.5.2              
 [7] rstudioapi_0.13        bit64_4.0.5            fansi_1.0.3            lubridate_1.8.0        xml2_1.3.3             codetools_0.2-18      
[13] splines_4.2.1          cachem_1.0.6           geneplotter_1.74.0     knitr_1.39             jsonlite_1.8.0         Rsamtools_2.12.0      
[19] broom_1.0.0            annotate_1.74.0        dbplyr_2.2.1           png_0.1-7              compiler_4.2.1         httr_1.4.3            
[25] backports_1.4.1        assertthat_0.2.1       Matrix_1.4-1           fastmap_1.1.0          gargle_1.2.0           cli_3.3.0             
[31] prettyunits_1.1.1      htmltools_0.5.3        tools_4.2.1            gtable_0.3.0           glue_1.6.2             GenomeInfoDbData_1.2.8
[37] rappdirs_0.3.3         Rcpp_1.0.9             cellranger_1.1.0       vctrs_0.4.1            Biostrings_2.64.0      svglite_2.1.0         
[43] xfun_0.32              rvest_1.0.2            lifecycle_1.0.1        pacman_0.5.1           statmod_1.4.36         XML_3.99-0.10         
[49] googlesheets4_1.0.0    zlibbioc_1.42.0        scales_1.2.0           hms_1.1.1              parallel_4.2.1         curl_4.3.2            
[55] memoise_2.0.1          biomaRt_2.52.0         stringi_1.7.8          RSQLite_2.2.15         genefilter_1.78.0      filelock_1.0.2        
[61] rlang_1.0.4            pkgconfig_2.0.3        systemfonts_1.0.4      bitops_1.0-7           evaluate_0.16          lattice_0.20-45       
[67] bit_4.0.4              tidyselect_1.1.2       plyr_1.8.7             magrittr_2.0.3         R6_2.5.1               generics_0.1.3        
[73] DelayedArray_0.22.0    DBI_1.1.3              pillar_1.8.0           haven_2.5.0            withr_2.5.0            survival_3.4-0        
[79] KEGGREST_1.36.3        RCurl_1.98-1.8         modelr_0.1.8           crayon_1.5.1           utf8_1.2.2             BiocFileCache_2.4.0   
[85] tzdb_0.3.0             rmarkdown_2.14         progress_1.2.2         locfit_1.5-9.6         grid_4.2.1             readxl_1.4.0          
[91] blob_1.2.3             reprex_2.0.1           digest_0.6.29          webshot_0.5.3          xtable_1.8-4           munsell_0.5.0         
[97] viridisLite_0.4.0
DEXSeq • 992 views
ADD COMMENT
0
Entering edit mode

Thanks for your detail post. I think this might be related to this issue here: DEXSeq errors "Error in scan( line ... did not have 3 elements" and "Error in FUN(X[[i]], ...) : subscript out of bounds" (DEXSeqDataSetFromHTSeq)

Just to verify, could you please post the first lines of your count files?

ADD REPLY
1
Entering edit mode

Yes, indeed it is the same error. Here are the first few lines of my count file.

"ENSG00000000003":"001" 0
"ENSG00000000003":"002" 1032
"ENSG00000000003":"003" 412
"ENSG00000000003":"004" 0
"ENSG00000000003":"005" 280
"ENSG00000000003":"006" 263
"ENSG00000000003":"007" 272
"ENSG00000000003":"008" 339
"ENSG00000000003":"009" 295

As already mentioned by Arthur in that thread, I also thought of modifying the count files to remove the quotes around the colon, and that also solves the issue. I tried -

sed -i 's/":"/:/1' countfile
ADD REPLY

Login before adding your answer.

Traffic: 301 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6