Question

DEXSeqDataSet object DEXSeqDataSetFromHTSeq Error: line number did not have 3 elements

0

Entering edit mode

yr542 • 0

@yr542-17045

Last seen 7 months ago

United States

I have been working with DEXseq. Initially my data was aligned using RefSeq and then as DEXseq uses Ensembl I used a gtf/gff from Ensembl for the model organism I am working on.

My code blocks that ran quite well with no error are placed below:

library(GenomicFeatures)
download.file(
    "https://ftp.ensembl.org/pub/release-110/gtf/danio_rerio/Danio_rerio.GRCz11.110.chr.gtf.gz",
    destfile="/path/to/Downloads/Danio_rerio.GRCz11.110.chr.gtf.gz")

# Must use a **GTF** for the following:
txdb = makeTxDbFromGFF("path/to/Danio_rerio.GRCz11.110.chr.gtf.gz")

## Seems that the DEXseq object wont be created with RefSeq so trying a different GFF (Danio_rerio.GRCz11.110.chr.gff3) from Ensembl
inDir="path/to/Downloads/"
flattenedFile = list.files(inDir, pattern="\\.gff3$", full.names=TRUE)

# Provide the path to the directory containing counts files
countsDir <- "path/tol/Counts_Using_Ensembl/Folder_with_counts"

# List all files in the directory ending with ".txt"
countsFiles <- list.files(countsDir, pattern = "\\.txt$", full.names = TRUE)


sampleTable = data.frame(
   row.names = c("1_Experiment.txt", "2_Experiment.txt", "3_Experiment.txt",
                 "1_Control.txt", "2_Control.txt", "3_Control.txt" ),
   condition = c("knockdown", "knockdown", "knockdown",
                 "control", "control", "control" ),
   libType = c("paired-end", "paired-end", "paired-end", 
               "paired-end", "paired-end", "paired-end" ) 
)

My script runs quite well until I hit this code block in Jupyter Lab:

library("DEXSeq")
# Create the DEXSeqDataSet object
dxd <- DEXSeqDataSetFromHTSeq(
  countsFiles,
  sampleData,
  design= ~ sample + exon + condition:exon,
  flattenedfile=flattenedFile )

Error:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 326107 did not have 3 elements
Traceback:

1. DEXSeqDataSetFromHTSeq(countsFiles_unquote, sampleData, design = ~sample + 
 .     exon + condition:exon, flattenedfile = flattenedFile)
2. lapply(countfiles, function(x) read.table(x, header = FALSE, 
 .     stringsAsFactors = FALSE))
3. lapply(countfiles, function(x) read.table(x, header = FALSE, 
 .     stringsAsFactors = FALSE))
4. FUN(X[[i]], ...)
5. read.table(x, header = FALSE, stringsAsFactors = FALSE)
6. scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
 .     nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE, 
 .     fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip, 
 .     multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes, 
 .     flush = flush, encoding = encoding, skipNul = skipNul)

I have tried removing the last couple of lines that contain summary data from each counts sample using the recommended code by DEXseq ( you can see it here ) and I also tried run it with RefSeq gtf/gff but the same error persists. I have also tried to remove the quotes and use an Unquote directory with the files with quotes removed using a modified version of what this person recommended when he came across a similar error here. Any guidance is appreciated.

My Session Information Is Below:

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /path/to/Resources/lib/libRblas.0.dylib 
LAPACK: /path/to/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] C/UTF-8/C/C/C/C

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] digest_0.6.33   IRdisplay_1.1   utf8_1.2.3      base64enc_0.1-3
 [5] fastmap_1.1.1   glue_1.6.2      htmltools_0.5.5 repr_1.1.6     
 [9] lifecycle_1.0.3 cli_3.6.1       fansi_1.0.4     vctrs_0.6.3    
[13] pbdZMQ_0.3-9    compiler_4.3.1  tools_4.3.1     evaluate_0.21  
[17] pillar_1.9.0    crayon_1.5.2    rlang_1.1.1     jsonlite_1.8.7 
[21] IRkernel_1.3.2  uuid_1.1-0

DEXSeq HTSeq R • 545 views

ADD COMMENT • link 8 months ago • updated 7 months ago yr542 • 0

score 0 · Answer 1 · 2023-08-11

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 12 hours ago

United States

The error just says you have rows with < 3 fields. It might not be just the bottom few rows. If the error persists, that's because you still have rows with < 3 fields. You could inspect the rows, doing something like

for f in *.txt; do awk 'NF != 3' ${f} > ${f/.txt/_check.txt}; done

Or just blast through them all

for f in *.txt; do awk 'NF == 3' ${f} > ${f/.txt/_fixed.txt}; done

And then use the fixed countFiles.

ADD COMMENT • link 8 months ago James W. MacDonald 65k

0

Entering edit mode

It seems that the bottom lines contained summary statistics - when those are removed it worked.

ADD REPLY • link 7 months ago yr542 • 0