Question

Error with DESeq2: DESeqDataSet(se, design = design, ignoreRank) : counts matrix should be numeric, currently it has mode: character

0

Entering edit mode

molly.fraser • 0

@578d8d47

Last seen 3.2 years ago

United States

Hi all,

I am an undergraduate researcher using DESeq2 for the first time. I am running into an issue with DESeqDataSetFromMatrix where my R is telling me my count data needs to be numeric.

When I check the class of my countData, it is a matrix. When I try to coerce the matrix with as.numeric, I get NAs. I've attached my code below and the error message, does anyone have any ideas on how to trouble shoot this?

Error in DESeqDataSet(se, design = design, ignoreRank) : counts matrix should be numeric, currently it has mode: character

Thanks, Molly

Samples<-c("MM-0017-RNA-T-07", "MM-0623-RNA-T-01", "MM-0039-RNA-T-06")
inputs<-list()
for (i in 1:length(Samples)){
  inputs[[i]] <- paste0("/data1/users/molly/", "ciri_out/", Samples[i], ".ciri.output")
}
names(inputs) <- Samples


combined.df <- ldply(inputs, function(x){
  a <- read.table(file=x, sep="\t", header=T, comment.char="",
                  stringsAsFactors=F)[,c(1:5)]; a}) 
colnames(combined.df)[1] <- "sampleID"


counts <- as.matrix(combined.df)
coldata<- data.frame(sample_id=Samples, condition=c("B","B","B","M","M","M"))
coldata$sample_id <- paste0(coldata$sample_id,"-",coldata$condition) #MR
rownames(coldata) <- coldata$sample_id
coldata$condition <- factor(coldata$condition)  


dds <- DESeqDataSetFromMatrix(countData = counts,
                              colData = coldata,
                              design = ~ condition)

# Error in DESeqDataSet(se, design = design, ignoreRank) : 
  #counts matrix should be numeric, currently it has mode: character

sessionInfo( )
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.26.0               SummarizedExperiment_1.16.1 DelayedArray_0.12.3         BiocParallel_1.20.1        
 [5] matrixStats_0.62.0          Biobase_2.46.0              GenomicRanges_1.38.0        GenomeInfoDb_1.22.1        
 [9] IRanges_2.20.2              S4Vectors_0.24.4            BiocGenerics_0.32.0         data.table_1.14.4          
[13] forcats_0.5.2               stringr_1.4.1               purrr_0.3.5                 readr_2.1.3                
[17] tidyr_1.2.1                 tibble_3.1.8                ggplot2_3.4.0               tidyverse_1.3.2            
[21] dplyr_1.0.10                plyr_1.8.8                  BiocManager_1.30.19        

loaded via a namespace (and not attached):
 [1] googledrive_2.0.0      colorspace_2.0-3       deldir_1.0-6           ellipsis_0.3.2         htmlTable_2.4.1       
 [6] XVector_0.26.0         base64enc_0.1-3        fs_1.5.2               rstudioapi_0.14        bit64_4.0.5           
[11] AnnotationDbi_1.48.0   fansi_1.0.3            lubridate_1.9.0        xml2_1.3.3             splines_3.6.3         
[16] cachem_1.0.6           geneplotter_1.64.0     knitr_1.40             Formula_1.2-4          jsonlite_1.8.3        
[21] broom_1.0.1            annotate_1.64.0        cluster_2.1.4          dbplyr_2.2.1           png_0.1-7             
[26] compiler_3.6.3         httr_1.4.4             backports_1.4.1        assertthat_0.2.1       Matrix_1.5-3          
[31] fastmap_1.1.0          gargle_1.2.1           cli_3.4.1              htmltools_0.5.3        tools_3.6.3           
[36] gtable_0.3.1           glue_1.6.2             GenomeInfoDbData_1.2.2 Rcpp_1.0.9             cellranger_1.1.0      
[41] vctrs_0.5.0            xfun_0.34              rvest_1.0.3            timechange_0.1.1       lifecycle_1.0.3       
[46] XML_3.99-0.3           googlesheets4_1.0.1    zlibbioc_1.32.0        scales_1.2.1           hms_1.1.2             
[51] RColorBrewer_1.1-3     yaml_2.3.6             memoise_2.0.1          gridExtra_2.3          rpart_4.1.19          
[56] latticeExtra_0.6-30    stringi_1.7.8          RSQLite_2.2.18         genefilter_1.68.0      checkmate_2.1.0       
[61] rlang_1.0.6            pkgconfig_2.0.3        bitops_1.0-7           lattice_0.20-45        htmlwidgets_1.5.4     
[66] bit_4.0.4              tidyselect_1.2.0       magrittr_2.0.3         R6_2.5.1               generics_0.1.3        
[71] Hmisc_4.7-1            DBI_1.1.3              pillar_1.8.1           haven_2.5.1            foreign_0.8-71        
[76] withr_2.5.0            survival_3.4-0         RCurl_1.98-1.9         nnet_7.3-12            modelr_0.1.10         
[81] crayon_1.5.2           interp_1.1-3           utf8_1.2.2             tzdb_0.3.0             jpeg_0.1-9            
[86] locfit_1.5-9.4         grid_3.6.3             readxl_1.4.1           blob_1.2.3             reprex_2.0.2          
[91] digest_0.6.30          xtable_1.8-4           munsell_0.5.0

DESeq2 • 5.0k views

ADD COMMENT • link updated 3.2 years ago by ATpoint ★ 5.0k • written 3.2 years ago by molly.fraser • 0

score 0 · Answer 1 · 2022-11-16

0

Entering edit mode

ATpoint ★ 5.0k

@atpoint-13662

Last seen 3 hours ago

Germany

The counts must be a numeric matrix with only integer counts. Line colnames(combined.df)[1] <- "sampleID" suggests that the first column is an ID column, so move that to rownames and then remove that column.

ADD COMMENT • link 3.2 years ago ATpoint ★ 5.0k

0

Entering edit mode

Thanks for your reply! Just to clarify - which column are you suggesting I remove?

ADD REPLY • link 3.2 years ago molly.fraser • 0

0

Entering edit mode

The non-numeric one, seems to be the first one. What is the output of combined.df[1:3,1:3]?

ADD REPLY • link 3.2 years ago ATpoint ★ 5.0k

0

Entering edit mode

The output is:

               .id           circRNA_ID  chr
2 MM-0017-RNA-T-07 chr1:1668327|1739704 chr1
4 MM-0017-RNA-T-07 chr1:1804419|1817875 chr1
5 MM-0017-RNA-T-07 chr1:1815756|1825499 chr1

ADD REPLY • link 3.2 years ago molly.fraser • 0

0

Entering edit mode

Well, as said above, the count data must be a numeric matrix, that means only integers are allowed. Remove all non-numeric columns. All shown columns here are non-numeric.

ADD REPLY • link 3.2 years ago ATpoint ★ 5.0k