DEXSeq- dexseq_prepare_annotation.py error
1
0
Entering edit mode
@anitha-sundararajan-6152
Last seen 10.2 years ago
Hi I am trying to run dexseq_prepare_annotation.py on a gtf file (Ensembl version) downloaded from iGenomes. The organism is Arabidopsis thaliana. I am constantly getting error messages that look like this: Traceback (most recent call last): File "/home/as/R/x86_64-unknown-linux-gnu- library/3.0/DEXSeq/python_scripts/dexseq_prepare_annotation.py", line 51, in <module> for f in HTSeq.GFF_Reader( gtf_file ): File "/sw/python/2.7.1/lib/python2.7/site-packages/HTSeq-0.5.4p3-py2.7 -linux-x86_64.egg/HTSeq/__init__.py", line 221, in __iter__ ( attr, name ) = parse_GFF_attribute_string( attributeStr, True ) File "/sw/python/2.7.1/lib/python2.7/site-packages/HTSeq-0.5.4p3-py2.7 -linux-x86_64.egg/HTSeq/__init__.py", line 174, in parse_GFF_attribute_string raise ValueError, "The attribute string seems to contain mismatched quotes." ValueError: The attribute string seems to contain mismatched quotes. The command I used is: /home/as/R/x86_64-unknown-linux-gnu- library/3.0/DEXSeq/python_scripts/dexseq_prepare_annotation.py genes.gtf genes.flattened.gff I tried running the same script for other gtf files in the database (human, drosophila) and the script seems to work fine and the gtf files look comparable too (at a glance anyway) . Any help will be appreciated. A few lines from the gtf file Im using: 1 protein_coding exon 3631 3913 . + . exon_number "1"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id "P20332"; seqedit "false"; transcript_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22545"; 1 protein_coding CDS 3760 3913 . + 0 exon_number "1"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id "P20332"; protein_id "AT1G01010.1"; transcript_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22545"; 1 protein_coding start_codon 3760 3762 . + 0 exon_number "1"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id "P20332"; transcript_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22545"; 1 protein_coding CDS 3996 4276 . + 2 exon_number "2"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id "P20332"; protein_id "AT1G01010.1"; transcript_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22545"; I can send the complete file, need be. Thanks so much for your help. Anitha Sundararajan
Organism Organism • 1.6k views
ADD COMMENT
0
Entering edit mode
Simon Anders ★ 3.8k
@simon-anders-3855
Last seen 4.3 years ago
Zentrum für Molekularbiologie, Universi…
Hi Anitha On 09/04/14 19:50, Anitha Sundararajan wrote: > ValueError: The attribute string seems to contain mismatched quotes. The A. thaliana GTF file contains a few rather strange gene names, which contain semicolons, and this confused previous versions of HTSeq. In the current version, this is fixed, so please update and try again. Simon
ADD COMMENT

Login before adding your answer.

Traffic: 970 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6