Search
Question: DEXSeq- dexseq_prepare_annotation.py error
0
gravatar for Anitha Sundararajan
4.2 years ago by
Anitha Sundararajan140 wrote:
Hi I am trying to run dexseq_prepare_annotation.py on a gtf file (Ensembl version) downloaded from iGenomes. The organism is Arabidopsis thaliana. I am constantly getting error messages that look like this: Traceback (most recent call last): File "/home/as/R/x86_64-unknown-linux-gnu- library/3.0/DEXSeq/python_scripts/dexseq_prepare_annotation.py", line 51, in <module> for f in HTSeq.GFF_Reader( gtf_file ): File "/sw/python/2.7.1/lib/python2.7/site-packages/HTSeq-0.5.4p3-py2.7 -linux-x86_64.egg/HTSeq/__init__.py", line 221, in __iter__ ( attr, name ) = parse_GFF_attribute_string( attributeStr, True ) File "/sw/python/2.7.1/lib/python2.7/site-packages/HTSeq-0.5.4p3-py2.7 -linux-x86_64.egg/HTSeq/__init__.py", line 174, in parse_GFF_attribute_string raise ValueError, "The attribute string seems to contain mismatched quotes." ValueError: The attribute string seems to contain mismatched quotes. The command I used is: /home/as/R/x86_64-unknown-linux-gnu- library/3.0/DEXSeq/python_scripts/dexseq_prepare_annotation.py genes.gtf genes.flattened.gff I tried running the same script for other gtf files in the database (human, drosophila) and the script seems to work fine and the gtf files look comparable too (at a glance anyway) . Any help will be appreciated. A few lines from the gtf file Im using: 1 protein_coding exon 3631 3913 . + . exon_number "1"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id "P20332"; seqedit "false"; transcript_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22545"; 1 protein_coding CDS 3760 3913 . + 0 exon_number "1"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id "P20332"; protein_id "AT1G01010.1"; transcript_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22545"; 1 protein_coding start_codon 3760 3762 . + 0 exon_number "1"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id "P20332"; transcript_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22545"; 1 protein_coding CDS 3996 4276 . + 2 exon_number "2"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id "P20332"; protein_id "AT1G01010.1"; transcript_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22545"; I can send the complete file, need be. Thanks so much for your help. Anitha Sundararajan
ADD COMMENTlink modified 4.2 years ago by Simon Anders3.5k • written 4.2 years ago by Anitha Sundararajan140
0
gravatar for Simon Anders
4.2 years ago by
Simon Anders3.5k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.5k wrote:
Hi Anitha On 09/04/14 19:50, Anitha Sundararajan wrote: > ValueError: The attribute string seems to contain mismatched quotes. The A. thaliana GTF file contains a few rather strange gene names, which contain semicolons, and this confused previous versions of HTSeq. In the current version, this is fixed, so please update and try again. Simon
ADD COMMENTlink written 4.2 years ago by Simon Anders3.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 144 users visited in the last hour