DEXSeq: dexseq_count.py - Failure parsing GFF attribute line
1
0
Entering edit mode
lcscs12345 • 0
@lcscs12345-9530
Last seen 8.3 years ago

Hi,

I received the following errors when running dexseq_count.py. I've flattened Mus_musculus.GRCm38.83.gtf downloaded from Ensembl (same errors on flattened Mus_musculus.GRCm38.75.gtf, Mus_musculus.NCBIM37.66.gtf). Should I report these errors to HTSeq instead? Thank you!

$ python dexseq_count.py -p no -s no accepted_hits.sam flattened.gtf dexseq.out
Traceback (most recent call last):
  File "dexseq_count.py", line 94, in <module>
    for f in  HTSeq.GFF_Reader( gff_file ):
  File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 208, in __iter__
    ( attr, name ) = parse_GFF_attribute_string( attributeStr, True )
  File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 164, in parse_GFF_attribute_string
    raise ValueError, "Failure parsing GFF attribute line"
ValueError: Failure parsing GFF attribute line

$ pip show HTSeq
---
Name: HTSeq
Version: 0.6.1p1
Location: /usr/local/lib/python2.7/dist-packages
Requires:

$ cat /path/to/DEXSeq/DESCRIPTION
Package: DEXSeq
Version: 1.16.7

----------------------------

EDIT:

$ head dexseq.gtf
1       dexseq_prepare_annotation.py    aggregate_gene  3073253 3074322 .       +       .       gene_id "ENSMUSG00000102693"
1       dexseq_prepare_annotation.py    exonic_part     3073253 3074322 .       +       .       transcripts "ENSMUST00000193812"; exonic_part_number "001"; gene_id "ENSMUSG00000102693"
1       dexseq_prepare_annotation.py    aggregate_gene  3102016 3102125 .       +       .       gene_id "ENSMUSG00000064842"
1       dexseq_prepare_annotation.py    exonic_part     3102016 3102125 .       +       .       transcripts "ENSMUST00000082908"; exonic_part_number "001"; gene_id "ENSMUSG00000064842"
1       dexseq_prepare_annotation.py    aggregate_gene  3205901 3671498 .       -       .       gene_id "ENSMUSG00000051951"
1       dexseq_prepare_annotation.py    exonic_part     3205901 3206522 .       -       .       transcripts "ENSMUST00000162897"; exonic_part_number "001"; gene_id "ENSMUSG00000051951"
1       dexseq_prepare_annotation.py    exonic_part     3206523 3207317 .       -       .       transcripts "ENSMUST00000162897+ENSMUST00000159265"; exonic_part_number "002"; gene_id "ENSMUSG00000051951"
1       dexseq_prepare_annotation.py    exonic_part     3213439 3213608 .       -       .       transcripts "ENSMUST00000159265"; exonic_part_number "003"; gene_id "ENSMUSG00000051951"
1       dexseq_prepare_annotation.py    exonic_part     3213609 3214481 .       -       .       transcripts "ENSMUST00000162897+ENSMUST00000159265"; exonic_part_number "004"; gene_id "ENSMUSG00000051951"
1       dexseq_prepare_annotation.py    exonic_part     3214482 3215632 .       -       .       transcripts "ENSMUST00000070533+ENSMUST00000162897+ENSMUST00000159265"; exonic_part_number "005"; gene_id "ENSMUSG00000051951"

 

Different errors when using -f bam

$ python dexseq_count.py -p no -s no -f bam accepted_hits.bam dexseq.gtf dexseq.out
Traceback (most recent call last):
  File "dexseq_count.py", line 94, in <module>
    for f in  HTSeq.GFF_Reader( gff_file ):
  File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 207, in __iter__
    strand, frame, attributeStr ) = line.split( "\t", 8 )
ValueError: need more than 2 values to unpack

----------------------------

DEXSeq • 1.9k views
ADD COMMENT
0
Entering edit mode

Thanks a lot for your detailed report! Could you include the first lines of your flattened gtf file?

Alejandro

ADD REPLY
0
Entering edit mode

Interestingly, I received different errors for flattened Homo_sapiens.GRCh37.70.gtf.

$ python dexseq_count.py -p no accepted_hits.sam dexseq.gtf dexseq.out
Traceback (most recent call last):
  File "dexseq_count.py", line 94, in <module>
    for f in  HTSeq.GFF_Reader( gff_file ):
  File "/usr/local/lib/python2.7/dist-packages/HTSeq/__init__.py", line 210, in __iter__
    iv = GenomicInterval( seqname, int(start)-1, int(end), strand )
  File "_HTSeq.pyx", line 62, in HTSeq._HTSeq.GenomicInterval.__init__ (src/_HTSeq.c:2789)
  File "_HTSeq.pyx", line 71, in HTSeq._HTSeq.GenomicInterval.strand.__set__ (src/_HTSeq.c:2910)
ValueError: Strand must be'+', '-', or '.'.

 

$ head dexseq.gtf
1       dexseq_prepare_annotation.py    aggregate_gene  11869   14412   .       +       .       gene_id "ENSG00000223972"
1       dexseq_prepare_annotation.py    exonic_part     11869   11871   .       +       .       transcripts "ENST00000456328"; exonic_part_number "001"; gene_id "ENSG00000223972"
1       dexseq_prepare_annotation.py    exonic_part     11872   11873   .       +       .       transcripts "ENST00000456328+ENST00000515242"; exonic_part_number "002"; gene_id "ENSG00000223972"
1       dexseq_prepare_annotation.py    exonic_part     11874   12009   .       +       .       transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "003"; gene_id "ENSG00000223972"
1       dexseq_prepare_annotation.py    exonic_part     12010   12057   .       +       .       transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "004"; gene_id "ENSG00000223972"
1       dexseq_prepare_annotation.py    exonic_part     12058   12178   .       +       .       transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "005"; gene_id "ENSG00000223972"
1       dexseq_prepare_annotation.py    exonic_part     12179   12227   .       +       .       transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "006"; gene_id "ENSG00000223972"
1       dexseq_prepare_annotation.py    exonic_part     12595   12612   .       +       .       transcripts "ENST00000518655"; exonic_part_number "007"; gene_id "ENSG00000223972"
1       dexseq_prepare_annotation.py    exonic_part     12613   12697   .       +       .       transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "008"; gene_id "ENSG00000223972"
1       dexseq_prepare_annotation.py    exonic_part     12698   12721   .       +       .       transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "009"; gene_id "ENSG00000223972"
ADD REPLY
0
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 6 days ago
Novartis Institutes for BioMedical Rese…

 

Hi again, 

Strange, I could not reproduce the error message, this is what I did:

wget ftp://ftp.ensembl.org/pub/release-83/gtf/mus_musculus/Mus_musculus.GRCm38.83.gtf.gz
gunzip Mus_musculus.GRCm38.83.gtf.gz
python /g/huber/users/reyes/Rpcks/branches/DEXSeq/inst/python_scripts/dexseq_prepare_annotation.py -r no Mus_musculus.GRCm38.83.gtf Mus_musculus.GRCm38.83.DEXSeq.gtf

 

And then in python, I ran the part of the script that uses the GFF reader:

>>> gff_file = "Mus_musculus.GRCm38.83.DEXSeq.gtf"
>>> features = HTSeq.GenomicArrayOfSets( "auto", stranded=True )
>>> for f in  HTSeq.GFF_Reader( gff_file ):
...    if f.type == "exonic_part":
...       f.name = f.attr['gene_id'] + ":" + f.attr['exonic_part_number']
...       features[f.iv] += f
...
>>>

But I did not get an error message. Could you include the code that you are using?
Alejandro

 

ADD COMMENT
0
Entering edit mode

The script runs on my Ubuntu local machine but not on a RedHat server. On the server, I've tried ActivePython 2.7 + HTSeq 0.6.1 (original post) and Python 2.6 + HTSeq 0.5.4 (below errors). GFF reader works fine however.

$ wget ftp://ftp.ensembl.org/pub/release-83/gtf/mus_musculus/Mus_musculus.GRCm38.83.gtf.gz
$ gunzip Mus_musculus.GRCm38.83.gtf.gz
$ python ~/R/x86_64-redhat-linux-gnu-library/3.2/DEXSeq/python_scripts/dexseq_prepare_annotation.py -r no Mus_musculus.GRCm38.83.gtf Mus_musculus.GRCm38.83.dexseq.gtf
$ python ~/R/x86_64-redhat-linux-gnu-library/3.2/DEXSeq/python_scripts/dexseq_count.py -p no ~/doc/mouse/nih3t3/tophat_out/accepted_hits.sam Mus_musculus.GRCm38.83.dexseq.gtf out
Traceback (most recent call last):
  File "/Network/Servers/biocldap.otago.ac.nz/Volumes/BiochemXsan/student_users/chunshenlim/R/x86_64-redhat-linux-gnu-library/3.2/DEXSeq/python_scripts/dexseq_count.py", line 94, in <module>
    for f in  HTSeq.GFF_Reader( gff_file ):
  File "/usr/lib64/python2.6/site-packages/HTSeq-0.5.4p5-py2.6-linux-x86_64.egg/HTSeq/__init__.py", line 221, in __iter__
    ( attr, name ) = parse_GFF_attribute_string( attributeStr, True )
  File "/usr/lib64/python2.6/site-packages/HTSeq-0.5.4p5-py2.6-linux-x86_64.egg/HTSeq/__init__.py", line 177, in parse_GFF_attribute_string
    raise ValueError, "Failure parsing GFF attribute line"
ValueError: Failure parsing GFF attribute line

$ python
>>> import HTSeq
>>> gff_file = "Mus_musculus.GRCm38.83.dexseq.gtf"
>>> features = HTSeq.GenomicArrayOfSets( "auto", stranded=True )
>>> for f in  HTSeq.GFF_Reader( gff_file ):
...     if f.type == "exonic_part":
...         f.name = f.attr['gene_id'] + ":" + f.attr['exonic_part_number']
...         features[f.iv] += f
...
>>> quit()
ADD REPLY

Login before adding your answer.

Traffic: 934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6