DEXSeqDataSet object DEXSeqDataSetFromHTSeq Error: line number did not have 3 elements
1
0
Entering edit mode
yr542 • 0
@yr542-17045
Last seen 7 months ago
United States

I have been working with DEXseq. Initially my data was aligned using RefSeq and then as DEXseq uses Ensembl I used a gtf/gff from Ensembl for the model organism I am working on.

My code blocks that ran quite well with no error are placed below:

library(GenomicFeatures)
download.file(
    "https://ftp.ensembl.org/pub/release-110/gtf/danio_rerio/Danio_rerio.GRCz11.110.chr.gtf.gz",
    destfile="/path/to/Downloads/Danio_rerio.GRCz11.110.chr.gtf.gz")

# Must use a **GTF** for the following:
txdb = makeTxDbFromGFF("path/to/Danio_rerio.GRCz11.110.chr.gtf.gz")

## Seems that the DEXseq object wont be created with RefSeq so trying a different GFF (Danio_rerio.GRCz11.110.chr.gff3) from Ensembl
inDir="path/to/Downloads/"
flattenedFile = list.files(inDir, pattern="\\.gff3$", full.names=TRUE)

# Provide the path to the directory containing counts files
countsDir <- "path/tol/Counts_Using_Ensembl/Folder_with_counts"

# List all files in the directory ending with ".txt"
countsFiles <- list.files(countsDir, pattern = "\\.txt$", full.names = TRUE)


sampleTable = data.frame(
   row.names = c("1_Experiment.txt", "2_Experiment.txt", "3_Experiment.txt",
                 "1_Control.txt", "2_Control.txt", "3_Control.txt" ),
   condition = c("knockdown", "knockdown", "knockdown",
                 "control", "control", "control" ),
   libType = c("paired-end", "paired-end", "paired-end", 
               "paired-end", "paired-end", "paired-end" ) 
)

My script runs quite well until I hit this code block in Jupyter Lab:

library("DEXSeq")
# Create the DEXSeqDataSet object
dxd <- DEXSeqDataSetFromHTSeq(
  countsFiles,
  sampleData,
  design= ~ sample + exon + condition:exon,
  flattenedfile=flattenedFile )

Error:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 326107 did not have 3 elements
Traceback:

1. DEXSeqDataSetFromHTSeq(countsFiles_unquote, sampleData, design = ~sample + 
 .     exon + condition:exon, flattenedfile = flattenedFile)
2. lapply(countfiles, function(x) read.table(x, header = FALSE, 
 .     stringsAsFactors = FALSE))
3. lapply(countfiles, function(x) read.table(x, header = FALSE, 
 .     stringsAsFactors = FALSE))
4. FUN(X[[i]], ...)
5. read.table(x, header = FALSE, stringsAsFactors = FALSE)
6. scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
 .     nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE, 
 .     fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip, 
 .     multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes, 
 .     flush = flush, encoding = encoding, skipNul = skipNul)

I have tried removing the last couple of lines that contain summary data from each counts sample using the recommended code by DEXseq ( you can see it here ) and I also tried run it with RefSeq gtf/gff but the same error persists. I have also tried to remove the quotes and use an Unquote directory with the files with quotes removed using a modified version of what this person recommended when he came across a similar error here. Any guidance is appreciated.

My Session Information Is Below:

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /path/to/Resources/lib/libRblas.0.dylib 
LAPACK: /path/to/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] C/UTF-8/C/C/C/C

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] digest_0.6.33   IRdisplay_1.1   utf8_1.2.3      base64enc_0.1-3
 [5] fastmap_1.1.1   glue_1.6.2      htmltools_0.5.5 repr_1.1.6     
 [9] lifecycle_1.0.3 cli_3.6.1       fansi_1.0.4     vctrs_0.6.3    
[13] pbdZMQ_0.3-9    compiler_4.3.1  tools_4.3.1     evaluate_0.21  
[17] pillar_1.9.0    crayon_1.5.2    rlang_1.1.1     jsonlite_1.8.7 
[21] IRkernel_1.3.2  uuid_1.1-0
DEXSeq HTSeq R • 545 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States

The error just says you have rows with < 3 fields. It might not be just the bottom few rows. If the error persists, that's because you still have rows with < 3 fields. You could inspect the rows, doing something like

for f in *.txt; do awk 'NF != 3' ${f} > ${f/.txt/_check.txt}; done

Or just blast through them all

for f in *.txt; do awk 'NF == 3' ${f} > ${f/.txt/_fixed.txt}; done

And then use the fixed countFiles.

0
Entering edit mode

It seems that the bottom lines contained summary statistics - when those are removed it worked.

ADD REPLY

Login before adding your answer.

Traffic: 572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6