Entering edit mode
Anitha Sundararajan
▴
140
@anitha-sundararajan-6152
Last seen 10.3 years ago
Hi
I am trying to run dexseq_prepare_annotation.py on a gtf file (Ensembl
version) downloaded from iGenomes. The organism is Arabidopsis
thaliana. I am constantly getting error messages that look like this:
Traceback (most recent call last):
File
"/home/as/R/x86_64-unknown-linux-gnu-
library/3.0/DEXSeq/python_scripts/dexseq_prepare_annotation.py",
line 51, in <module>
for f in HTSeq.GFF_Reader( gtf_file ):
File
"/sw/python/2.7.1/lib/python2.7/site-packages/HTSeq-0.5.4p3-py2.7
-linux-x86_64.egg/HTSeq/__init__.py",
line 221, in __iter__
( attr, name ) = parse_GFF_attribute_string( attributeStr, True )
File
"/sw/python/2.7.1/lib/python2.7/site-packages/HTSeq-0.5.4p3-py2.7
-linux-x86_64.egg/HTSeq/__init__.py",
line 174, in parse_GFF_attribute_string
raise ValueError, "The attribute string seems to contain
mismatched
quotes."
ValueError: The attribute string seems to contain mismatched quotes.
The command I used is:
/home/as/R/x86_64-unknown-linux-gnu-
library/3.0/DEXSeq/python_scripts/dexseq_prepare_annotation.py
genes.gtf genes.flattened.gff
I tried running the same script for other gtf files in the database
(human, drosophila) and the script seems to work fine and the gtf
files
look comparable too (at a glance anyway) . Any help will be
appreciated.
A few lines from the gtf file Im using:
1 protein_coding exon 3631 3913 . + .
exon_number "1"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id
"P20332"; seqedit "false"; transcript_id "AT1G01010.1";
transcript_name
"AT1G01010.1"; tss_id "TSS22545";
1 protein_coding CDS 3760 3913 . + 0
exon_number "1"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id
"P20332"; protein_id "AT1G01010.1"; transcript_id "AT1G01010.1";
transcript_name "AT1G01010.1"; tss_id "TSS22545";
1 protein_coding start_codon 3760 3762 . +
0 exon_number "1"; gene_id "AT1G01010"; gene_name "ANAC001";
p_id
"P20332"; transcript_id "AT1G01010.1"; transcript_name "AT1G01010.1";
tss_id "TSS22545";
1 protein_coding CDS 3996 4276 . + 2
exon_number "2"; gene_id "AT1G01010"; gene_name "ANAC001"; p_id
"P20332"; protein_id "AT1G01010.1"; transcript_id "AT1G01010.1";
transcript_name "AT1G01010.1"; tss_id "TSS22545";
I can send the complete file, need be.
Thanks so much for your help.
Anitha Sundararajan