I have been working with DEXseq. Initially my data was aligned using RefSeq and then as DEXseq uses Ensembl I used a gtf/gff from Ensembl for the model organism I am working on.
My code blocks that ran quite well with no error are placed below:
library(GenomicFeatures) download.file( "https://ftp.ensembl.org/pub/release-110/gtf/danio_rerio/Danio_rerio.GRCz11.110.chr.gtf.gz", destfile="/path/to/Downloads/Danio_rerio.GRCz11.110.chr.gtf.gz") # Must use a **GTF** for the following: txdb = makeTxDbFromGFF("path/to/Danio_rerio.GRCz11.110.chr.gtf.gz") ## Seems that the DEXseq object wont be created with RefSeq so trying a different GFF (Danio_rerio.GRCz11.110.chr.gff3) from Ensembl inDir="path/to/Downloads/" flattenedFile = list.files(inDir, pattern="\\.gff3$", full.names=TRUE) # Provide the path to the directory containing counts files countsDir <- "path/tol/Counts_Using_Ensembl/Folder_with_counts" # List all files in the directory ending with ".txt" countsFiles <- list.files(countsDir, pattern = "\\.txt$", full.names = TRUE) sampleTable = data.frame( row.names = c("1_Experiment.txt", "2_Experiment.txt", "3_Experiment.txt", "1_Control.txt", "2_Control.txt", "3_Control.txt" ), condition = c("knockdown", "knockdown", "knockdown", "control", "control", "control" ), libType = c("paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end" ) )
My script runs quite well until I hit this code block in Jupyter Lab:
library("DEXSeq") # Create the DEXSeqDataSet object dxd <- DEXSeqDataSetFromHTSeq( countsFiles, sampleData, design= ~ sample + exon + condition:exon, flattenedfile=flattenedFile )
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 326107 did not have 3 elements Traceback: 1. DEXSeqDataSetFromHTSeq(countsFiles_unquote, sampleData, design = ~sample + . exon + condition:exon, flattenedfile = flattenedFile) 2. lapply(countfiles, function(x) read.table(x, header = FALSE, . stringsAsFactors = FALSE)) 3. lapply(countfiles, function(x) read.table(x, header = FALSE, . stringsAsFactors = FALSE)) 4. FUN(X[[i]], ...) 5. read.table(x, header = FALSE, stringsAsFactors = FALSE) 6. scan(file = file, what = what, sep = sep, quote = quote, dec = dec, . nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE, . fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip, . multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes, . flush = flush, encoding = encoding, skipNul = skipNul)
I have tried removing the last couple of lines that contain summary data from each counts sample using the recommended code by DEXseq ( you can see it here ) and I also tried run it with RefSeq gtf/gff but the same error persists. I have also tried to remove the quotes and use an Unquote directory with the files with quotes removed using a modified version of what this person recommended when he came across a similar error here. Any guidance is appreciated.
My Session Information Is Below:
R version 4.3.1 (2023-06-16) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.4.1 Matrix products: default BLAS: /path/to/Resources/lib/libRblas.0.dylib LAPACK: /path/to/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 locale:  C/UTF-8/C/C/C/C time zone: America/New_York tzcode source: internal attached base packages:  stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached):  digest_0.6.33 IRdisplay_1.1 utf8_1.2.3 base64enc_0.1-3  fastmap_1.1.1 glue_1.6.2 htmltools_0.5.5 repr_1.1.6  lifecycle_1.0.3 cli_3.6.1 fansi_1.0.4 vctrs_0.6.3  pbdZMQ_0.3-9 compiler_4.3.1 tools_4.3.1 evaluate_0.21  pillar_1.9.0 crayon_1.5.2 rlang_1.1.1 jsonlite_1.8.7  IRkernel_1.3.2 uuid_1.1-0