Hi,
I received the following errors when running dexseq_count.py. I've flattened Mus_musculus.GRCm38.83.gtf downloaded from Ensembl (same errors on flattened Mus_musculus.GRCm38.75.gtf, Mus_musculus.NCBIM37.66.gtf). Should I report these errors to HTSeq instead? Thank you!
$ python dexseq_count.py -p no -s no accepted_hits.sam flattened.gtf dexseq.out Traceback (most recent call last): File "dexseq_count.py", line 94, in <module> for f in HTSeq.GFF_Reader( gff_file ): File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 208, in __iter__ ( attr, name ) = parse_GFF_attribute_string( attributeStr, True ) File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 164, in parse_GFF_attribute_string raise ValueError, "Failure parsing GFF attribute line" ValueError: Failure parsing GFF attribute line $ pip show HTSeq --- Name: HTSeq Version: 0.6.1p1 Location: /usr/local/lib/python2.7/dist-packages Requires: $ cat /path/to/DEXSeq/DESCRIPTION Package: DEXSeq Version: 1.16.7
----------------------------
EDIT:
$ head dexseq.gtf 1 dexseq_prepare_annotation.py aggregate_gene 3073253 3074322 . + . gene_id "ENSMUSG00000102693" 1 dexseq_prepare_annotation.py exonic_part 3073253 3074322 . + . transcripts "ENSMUST00000193812"; exonic_part_number "001"; gene_id "ENSMUSG00000102693" 1 dexseq_prepare_annotation.py aggregate_gene 3102016 3102125 . + . gene_id "ENSMUSG00000064842" 1 dexseq_prepare_annotation.py exonic_part 3102016 3102125 . + . transcripts "ENSMUST00000082908"; exonic_part_number "001"; gene_id "ENSMUSG00000064842" 1 dexseq_prepare_annotation.py aggregate_gene 3205901 3671498 . - . gene_id "ENSMUSG00000051951" 1 dexseq_prepare_annotation.py exonic_part 3205901 3206522 . - . transcripts "ENSMUST00000162897"; exonic_part_number "001"; gene_id "ENSMUSG00000051951" 1 dexseq_prepare_annotation.py exonic_part 3206523 3207317 . - . transcripts "ENSMUST00000162897+ENSMUST00000159265"; exonic_part_number "002"; gene_id "ENSMUSG00000051951" 1 dexseq_prepare_annotation.py exonic_part 3213439 3213608 . - . transcripts "ENSMUST00000159265"; exonic_part_number "003"; gene_id "ENSMUSG00000051951" 1 dexseq_prepare_annotation.py exonic_part 3213609 3214481 . - . transcripts "ENSMUST00000162897+ENSMUST00000159265"; exonic_part_number "004"; gene_id "ENSMUSG00000051951" 1 dexseq_prepare_annotation.py exonic_part 3214482 3215632 . - . transcripts "ENSMUST00000070533+ENSMUST00000162897+ENSMUST00000159265"; exonic_part_number "005"; gene_id "ENSMUSG00000051951"
Different errors when using -f bam
$ python dexseq_count.py -p no -s no -f bam accepted_hits.bam dexseq.gtf dexseq.out Traceback (most recent call last): File "dexseq_count.py", line 94, in <module> for f in HTSeq.GFF_Reader( gff_file ): File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 207, in __iter__ strand, frame, attributeStr ) = line.split( "\t", 8 ) ValueError: need more than 2 values to unpack
----------------------------
Thanks a lot for your detailed report! Could you include the first lines of your flattened gtf file?
Alejandro
Interestingly, I received different errors for flattened Homo_sapiens.GRCh37.70.gtf.