Error when reading data with DropletUtils::read10xCounts
1
0
Entering edit mode
fabrost • 0
@fabrost-15946
Last seen 4.3 years ago

I try to read some data using DropletUtils::read10xCounts. However, I get an error:

```{r}
library(DropletUtils)
sce <- DropletUtils::read10xCounts("/scratch/GRCz10.e87/")
```

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 13 did not have 2 elements

The folder "/scratch/rulands/zebrafish_brain_christian_lange/bfx908.full_data/filtered_gene_bc_matrices/GRCz10.e87/" contains the "matrix.mtx", "genes.tsv" and "barcodes.tsv" files. However, I did not create those files myself, so I am not entirely sure whether they might be corrupted. I cannot upload the complete data and I do not understand how I could create a minimal dataset to reproduce the error. I can read "matrix.mtx" using read10xMatrix. Does anyone know, how I can read the full data?

```{r}
traceback()
```
3: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
       nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
       fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip,
       multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes,
       flush = flush, encoding = encoding, skipNul = skipNul)
2: read.table(gene.loc, header = FALSE, colClasses = "character",
       stringsAsFactors = FALSE)
1: DropletUtils::read10xCounts("/scratch/GRCz10.e87/")
```{r}
BiocInstaller::biocValid()
```
[1] TRUE
```{r}
sessionInfo()
```
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE Leap 42.3

Matrix products: default
BLAS: /usr/local/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /usr/local/R/3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8      LC_NUMERIC=C              LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8    LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C            LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
 [1] grid      splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DropletUtils_1.0.1                      pheatmap_1.0.10                        
 [3] slingshot_0.99.6                        princurve_1.1-12                       
 [5] M3Drop_1.6.0                            numDeriv_2016.8-1                      
 [7] org.Dr.eg.db_3.6.0                      biomaRt_2.36.1                         
 [9] Rgraphviz_2.24.0                        topGO_2.32.0                           
[11] SparseM_1.77                            GO.db_3.6.0                            
[13] graph_1.58.0                            TSCAN_1.18.0                           
[15] TxDb.Drerio.UCSC.danRer10.refGene_3.4.3 GenomicFeatures_1.32.0                 
[17] AnnotationDbi_1.42.1                    stringr_1.3.1                          
[19] scater_1.8.0                            SingleCellExperiment_1.2.0             
[21] SummarizedExperiment_1.10.1             DelayedArray_0.6.0                     
[23] BiocParallel_1.14.1                     matrixStats_0.53.1                     
[25] GenomicRanges_1.32.3                    GenomeInfoDb_1.16.0                    
[27] IRanges_2.14.10                         S4Vectors_0.18.2                       
[29] SC3_1.8.0                               readxl_1.1.0                           
[31] monocle_2.8.0                           DDRTree_0.1.5                          
[33] irlba_2.3.2                             VGAM_1.0-5                             
[35] Biobase_2.40.0                          BiocGenerics_0.26.0                    
[37] Matrix_1.2-14                           magrittr_1.5                           
[39] Hmisc_4.1-1                             ggplot2_2.2.1                          
[41] Formula_1.2-3                           survival_2.42-3                        
[43] lattice_0.20-35                         ggsci_2.9                              
[45] cluster_2.0.7-1                         data.table_1.11.4                      

loaded via a namespace (and not attached):
  [1] rtracklayer_1.40.2       prabclus_2.2-6           pkgmaker_0.27            tidyr_0.8.1             
  [5] acepack_1.4.1            bit64_0.9-7              knitr_1.20               rpart_4.1-13            
  [9] RCurl_1.95-4.10          doParallel_1.0.11        RSQLite_2.1.1            RANN_2.5.1              
 [13] combinat_0.0-8           bit_1.1-13               phylobase_0.8.4          xml2_1.2.0              
 [17] httpuv_1.4.3             assertthat_0.2.0         viridis_0.5.1            tximport_1.8.0          
 [21] evaluate_0.10.1          promises_1.0.1           BiocInstaller_1.30.0     DEoptimR_1.0-8          
 [25] progress_1.1.2           caTools_1.17.1           dendextend_1.8.0         igraph_1.2.1            
 [29] DBI_1.0.0                htmlwidgets_1.2          sparsesvd_0.1-4          purrr_0.2.4             
 [33] RSpectra_0.13-1          crosstalk_1.0.0          dplyr_0.7.5              backports_1.1.2         
 [37] trimcluster_0.1-2        gridBase_0.4-7           locfdr_1.1-8             ROCR_1.0-7              
 [41] withr_2.1.2              robustbase_0.93-0        checkmate_1.8.5          GenomicAlignments_1.16.0
 [45] prettyunits_1.0.2        mclust_5.4               ape_5.1                  lazyeval_0.2.1          
 [49] edgeR_3.22.2             pkgconfig_2.0.1          slam_0.1-43              nlme_3.1-137            
 [53] vipor_0.4.5              nnet_7.3-12              bindr_0.1.1              rlang_0.2.0             
 [57] diptest_0.75-7           miniUI_0.1.1.1           registry_0.5             cellranger_1.1.0        
 [61] rprojroot_1.3-2          rngtools_1.3.1           Rhdf5lib_1.2.1           base64enc_0.1-3         
 [65] beeswarm_0.2.3           whisker_0.3-2            viridisLite_0.3.0        rjson_0.2.19            
 [69] bitops_1.0-6             shinydashboard_0.7.0     rncl_0.8.2               KernSmooth_2.23-15      
 [73] Biostrings_2.48.0        blob_1.1.1               DelayedMatrixStats_1.2.0 rgl_0.99.16             
 [77] doRNG_1.6.6              manipulateWidget_0.9.0   scales_0.5.0             memoise_1.1.0           
 [81] plyr_1.8.4               howmany_0.3-1            gplots_3.0.1             bibtex_0.4.2            
 [85] gdata_2.18.0             zlibbioc_1.26.0          compiler_3.5.0           HSMMSingleCell_0.114.0  
 [89] bbmle_1.0.20             RColorBrewer_1.1-2       rrcov_1.4-4              Rsamtools_1.32.0        
 [93] ade4_1.7-11              XVector_0.20.0           htmlTable_1.12           MASS_7.3-50             
 [97] mgcv_1.8-23              tidyselect_0.2.4         stringi_1.2.2            densityClust_0.3        
[101] yaml_2.1.19              locfit_1.5-9.1           latticeExtra_0.6-28      ggrepel_0.8.0           
[105] tools_3.5.0              rstudioapi_0.7           uuid_0.1-2               foreach_1.4.4           
[109] foreign_0.8-70           RNeXML_2.1.1             gridExtra_2.3            Rtsne_0.13              
[113] digest_0.6.15            FNN_1.1                  shiny_1.1.0              qlcMatrix_0.9.7         
[117] fpc_2.1-11               bindrcpp_0.2.2           Rcpp_0.12.17             later_0.7.2             
[121] WriteXLS_4.0.0           httr_1.3.1               kernlab_0.9-26           colorspace_1.3-2        
[125] XML_3.98-1.11            clusterExperiment_2.0.2  statmod_1.4.30           flexmix_2.3-14          
[129] xtable_1.8-2             jsonlite_1.5             modeltools_0.2-21        R6_2.2.2                
[133] pillar_1.2.3             htmltools_0.3.6          mime_0.5                 NMF_0.21.0              
[137] glue_1.2.0               class_7.3-14             codetools_0.2-15         pcaPP_1.9-73            
[141] mvtnorm_1.0-7            tibble_1.4.2             ggbeeswarm_0.6.0         gtools_3.5.0            
[145] limma_3.36.1             rmarkdown_1.9            docopt_0.4.5             fastICA_1.2-1           
[149] munsell_0.4.3            e1071_1.6-8              rhdf5_2.24.0             GenomeInfoDbData_1.1.0  
[153] iterators_1.0.9          HDF5Array_1.8.0          reshape2_1.4.3           gtable_0.2.0
DropletUtils • 1.5k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 12 hours ago
The city by the bay

I daresay that this is due to some unusual symbol on line 13 of genes.tsv; probably a gene name with a quote in it, if I had to guess. Could you confirm this is the case, by just doing something like head -20 genes.tsv and seeing what happens around line 13?

ADD COMMENT
0
Entering edit mode

Very good hint, thanks for your help! The gene name in line 13 contains a space. Maybe changing read.table(gene.loc, header = FALSE, colClasses = "character", stringsAsFactors = FALSE) to read.table(gene.loc, header = FALSE, colClasses = "character", stringsAsFactors = FALSE, sep = "\t") would solve this. Now I am thinking of how to work around the issue right now. Should I rather modify the data or read it in a different way?

First 20 lines of genes.tsv:

ENSDARG00000104632    rerg
ENSDARG00000100660    si:ch73-252i11.1
ENSDARG00000098417    syn3
ENSDARG00000100422    ptpro
ENSDARG00000102128    eps8
ENSDARG00000103095    tbk1
ENSDARG00000102226    gpr19
ENSDARG00000104049    crebl2
ENSDARG00000102474    dusp16
ENSDARG00000100143    lrp6
ENSDARG00000104839    mansc1
ENSDARG00000104373    si:zfos-932h1.2
ENSDARG00000098311    si: zfos-932h1.3
ENSDARG00000102121    prr5b
ENSDARG00000102123    phtf2
ENSDARG00000102141    CABZ01102632.1
ENSDARG00000105725    si:cabz01088622.2
ENSDARG00000099787    echdc3
ENSDARG00000070546    msgn1
ENSDARG00000045914    si:ch211-51e12.7
ADD REPLY
0
Entering edit mode

After replacing every space in genes.tsv with an underscore, I can read the data just fine.
 

ADD REPLY
0
Entering edit mode

Yes, that's right, or switching to read.delim. I have done this and pushed this to the Github repository; you can either try to install this new version, or wait for it to show up on the BioC build machines in 1-2 days. Or you can just edit genes.tsv to get rid of the space, which probably shouldn't be there in the first place.

ADD REPLY

Login before adding your answer.

Traffic: 439 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6