Hallo community,
I am having trouble constructing a DESEQDataSet object from matrix for my C.Elegans dataset. I am encountering the following message: Error in seq_len(length(idx) - 1) : argument must be coercible to non-negative integer and cannot proceed with my analysis. P.s. see details of my script below.
Any suggestions on how to resolve this problem would be greatly appreciated. It's worth mentioning that i have used a similar analysis pipeline for mouse and Drosophila datasets without any issues. A check for duplicated row names indicates 418. Is this a formatting issue and how do i resolve it? I have tried reformatting the gtf file by removing hyphens from gene names but this does not help!
> library(DESeq2)
> library(GenomicFeatures)
> library(GenomicAlignments)
> load("221112_Counts_Statistics.rda")
> load("221112_CElegansGenesForCounting.rda")
> ls()
[1] "counts.g" "read.stats" "WBGene"
> read.stats
CE_D8_B1_1_JW CE_D8_B1_2_JW CE_D8_B2_1_JW CE_D8_B3_1_JW
Total 11290997 4276908 9247162 7796030
MappingGenes 11251447 4244323 9204484 7754390
MappingWithinGenes 10902502 4101953 8959358 7529636
Non-overlapping 10715356 4022032 8808297 7397325
CE_D8_B4_1_JW CE_D8_B4_2_JW CE_D24_B2_1_JW CE_D24_B3_1_JW
Total 4077812 4279296 3529684 2205921
MappingGenes 4052732 4262418 3401422 2136509
MappingWithinGenes 3973850 4126993 3292149 2073138
Non-overlapping 3896790 4059245 3212992 2022372
CE_D24_B3_2_JW CE_D24_B3_3_JW CE_D24_B4_1_JW CE_D24_B4_2_JW
Total 629047 6409842 6300939 1054582
MappingGenes 604816 6321809 5472654 1002492
MappingWithinGenes 590021 6131986 5161268 971470
Non-overlapping 574902 6019360 5033469 951305
> head(counts.g)
CE_D8_B1_1_JW CE_D8_B1_2_JW CE_D8_B2_1_JW CE_D8_B3_1_JW CE_D8_B4_1_JW
aat2 45 61 90 39 17
aat6 34 30 60 27 38
abf3 0 0 0 0 0
abf4 1 3 5 1 0
abt4 147 39 63 75 61
abu1 0 0 0 0 0
CE_D8_B4_2_JW CE_D24_B2_1_JW CE_D24_B3_1_JW CE_D24_B3_2_JW CE_D24_B3_3_JW
aat2 37 251 79 17 270
aat6 24 18 1 1 7
abf3 0 0 0 0 0
abf4 0 0 0 0 0
abt4 25 26 0 2 19
abu1 0 0 0 0 0
CE_D24_B4_1_JW CE_D24_B4_2_JW
aat2 560 72
aat6 2 1
abf3 41 0
abf4 18 0
abt4 82 6
abu1 8 1
> lm.sTable <- read.table("LMSeq_CE_JW_vtest_sampleTable.txt",header=TRUE,sep="\t")
> lm.sTable
SampleName Sample Condtn albut
1 CE_D8_B1_1_JW D8-1 Early LMSeq
2 CE_D8_B1_2_JW D8-2 Early LMSeq
3 CE_D8_B2_1_JW D8-3 Early LMSeq
4 CE_D8_B3_1_JW D8-4 Early LMSeq
5 CE_D8_B4_1_JW D8-5 Early LMSeq
6 CE_D8_B4_2_JW D8-6 Early LMSeq
7 CE_D24_B2_1_JW D24-1 Late LMSeq
8 CE_D24_B3_1_JW D24-2 Late LMSeq
9 CE_D24_B3_2_JW D24-3 Late LMSeq
10 CE_D24_B3_3_JW D24-4 Late LMSeq
11 CE_D24_B4_1_JW D24-5 Late LMSeq
12 CE_D24_B4_2_JW D24-6 Late LMSeq
> lm.sTable$Condition <- relevel(lm.sTable$Condtn,"Early")
> rowdata <- genes
> rowdata <- WBGene
#problematic code:
> lm.dds <- DESeqDataSetFromMatrix(countData=counts.g, colData=lm.sTable, design=~Condtn)
Error in seq_len(length(idx) - 1) : argument must be coercible to non-negative integer
In addition: Warning message: In DESeqDataSet(se, design = design, ignoreRank) :
2765 duplicate rownames were renamed by adding numbers
> anyDuplicated(rownames(counts.g))
[1] 418
sessionInfo( )
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 14.04.6 LTS
Matrix products: default
BLAS/LAPACK: /home/escifo/anaconda3/envs/lmseq/lib/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicAlignments_1.18.1 Rsamtools_1.34.1
[3] Biostrings_2.50.2 XVector_0.22.0
[5] GenomicFeatures_1.34.8 AnnotationDbi_1.44.0
[7] DESeq2_1.22.2 SummarizedExperiment_1.12.0
[9] DelayedArray_0.6.6 BiocParallel_1.16.6
[11] matrixStats_0.56.0 Biobase_2.42.0
[13] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[15] IRanges_2.16.0 S4Vectors_0.20.1
[17] BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] httr_1.4.1 bit64_0.9-7 splines_3.5.1
[4] Formula_1.2-3 assertthat_0.2.1 latticeExtra_0.6-28
[7] blob_1.2.1 GenomeInfoDbData_1.2.0 progress_1.2.2
[10] pillar_1.4.6 RSQLite_2.2.0 backports_1.1.6
[13] lattice_0.20-41 glue_1.4.0 digest_0.6.25
[16] RColorBrewer_1.1-2 checkmate_2.0.0 colorspace_1.4-1
[19] htmltools_0.4.0 Matrix_1.2-18 XML_3.99-0.3
[22] pkgconfig_2.0.3 biomaRt_2.38.0 genefilter_1.64.0
[25] zlibbioc_1.28.0 purrr_0.3.4 xtable_1.8-4
[28] scales_1.1.0 htmlTable_1.13.3 tibble_2.1.3
[31] annotate_1.60.1 ggplot2_3.3.0 ellipsis_0.3.0
[34] nnet_7.3-14 survival_3.1-12 magrittr_1.5
[37] crayon_1.3.4 memoise_1.1.0 foreign_0.8-76
[40] prettyunits_1.1.1 tools_3.5.1 data.table_1.12.8
[43] hms_0.5.3 lifecycle_0.2.0 stringr_1.4.0
[46] munsell_0.5.0 locfit_1.5-9.4 cluster_2.1.0
[49] compiler_3.5.1 rlang_0.4.5 grid_3.5.1
[52] RCurl_1.95-4.12 rstudioapi_0.11 htmlwidgets_1.5.1
[55] bitops_1.0-6 base64enc_0.1-3 gtable_0.3.0
[58] DBI_1.1.0 R6_2.4.1 gridExtra_2.3
[61] rtracklayer_1.42.2 knitr_1.28 dplyr_0.8.5
[64] bit_1.1-15.2 Hmisc_4.4-0 stringi_1.4.3
[67] Rcpp_1.0.4.6 vctrs_0.2.4 geneplotter_1.60.0
[70] rpart_4.1-15 acepack_1.4.1 tidyselect_1.0.0
[73] xfun_0.13