Hi,
I received the following errors when running dexseq_count.py. I've flattened Mus_musculus.GRCm38.83.gtf downloaded from Ensembl (same errors on flattened Mus_musculus.GRCm38.75.gtf, Mus_musculus.NCBIM37.66.gtf). Should I report these errors to HTSeq instead? Thank you!
$ python dexseq_count.py -p no -s no accepted_hits.sam flattened.gtf dexseq.out
Traceback (most recent call last):
File "dexseq_count.py", line 94, in <module>
for f in HTSeq.GFF_Reader( gff_file ):
File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 208, in __iter__
( attr, name ) = parse_GFF_attribute_string( attributeStr, True )
File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 164, in parse_GFF_attribute_string
raise ValueError, "Failure parsing GFF attribute line"
ValueError: Failure parsing GFF attribute line
$ pip show HTSeq
---
Name: HTSeq
Version: 0.6.1p1
Location: /usr/local/lib/python2.7/dist-packages
Requires:
$ cat /path/to/DEXSeq/DESCRIPTION
Package: DEXSeq
Version: 1.16.7
----------------------------
EDIT:
$ head dexseq.gtf 1 dexseq_prepare_annotation.py aggregate_gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693" 1 dexseq_prepare_annotation.py exonic_part 3073253 3074322 . + . transcripts "ENSMUST00000193812"; exonic_part_number "001"; gene_id "ENSMUSG00000102693" 1 dexseq_prepare_annotation.py aggregate_gene 3102016 3102125 . + . gene_id "ENSMUSG00000064842" 1 dexseq_prepare_annotation.py exonic_part 3102016 3102125 . + . transcripts "ENSMUST00000082908"; exonic_part_number "001"; gene_id "ENSMUSG00000064842" 1 dexseq_prepare_annotation.py aggregate_gene 3205901 3671498 . - . gene_id "ENSMUSG00000051951" 1 dexseq_prepare_annotation.py exonic_part 3205901 3206522 . - . transcripts "ENSMUST00000162897"; exonic_part_number "001"; gene_id "ENSMUSG00000051951" 1 dexseq_prepare_annotation.py exonic_part 3206523 3207317 . - . transcripts "ENSMUST00000162897+ENSMUST00000159265"; exonic_part_number "002"; gene_id "ENSMUSG00000051951" 1 dexseq_prepare_annotation.py exonic_part 3213439 3213608 . - . transcripts "ENSMUST00000159265"; exonic_part_number "003"; gene_id "ENSMUSG00000051951" 1 dexseq_prepare_annotation.py exonic_part 3213609 3214481 . - . transcripts "ENSMUST00000162897+ENSMUST00000159265"; exonic_part_number "004"; gene_id "ENSMUSG00000051951" 1 dexseq_prepare_annotation.py exonic_part 3214482 3215632 . - . transcripts "ENSMUST00000070533+ENSMUST00000162897+ENSMUST00000159265"; exonic_part_number "005"; gene_id "ENSMUSG00000051951"
Different errors when using -f bam
$ python dexseq_count.py -p no -s no -f bam accepted_hits.bam dexseq.gtf dexseq.out
Traceback (most recent call last):
File "dexseq_count.py", line 94, in <module>
for f in HTSeq.GFF_Reader( gff_file ):
File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 207, in __iter__
strand, frame, attributeStr ) = line.split( "\t", 8 )
ValueError: need more than 2 values to unpack
----------------------------

Thanks a lot for your detailed report! Could you include the first lines of your flattened gtf file?
Alejandro
Interestingly, I received different errors for flattened Homo_sapiens.GRCh37.70.gtf.
$ python dexseq_count.py -p no accepted_hits.sam dexseq.gtf dexseq.out Traceback (most recent call last): File "dexseq_count.py", line 94, in <module> for f in HTSeq.GFF_Reader( gff_file ): File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 210, in __iter__ iv = GenomicInterval( seqname, int(start)-1, int(end), strand ) File "_HTSeq.pyx", line 62, in HTSeq._HTSeq.GenomicInterval.__init__ (src/_HTSeq.c:2789) File "_HTSeq.pyx", line 71, in HTSeq._HTSeq.GenomicInterval.strand.__set__ (src/_HTSeq.c:2910) ValueError: Strand must be'+', '-', or '.'.