DEXSeq prepare_annotation.py error
1
0
Entering edit mode
arom2 • 0
@arom2-8204
Last seen 7.7 years ago
United States

To Whom It May Concern,

I am currently working on a project examining differential expression in RNA-seq data with DEXseq and I have encountered a few flags when using the dexseq_prepare_annotation.py script to interpret my .gtf file. My annotation file was generated by NCBI and I initially had to change 'Parent' to 'gene_id' in the attribute column (please see below). However now I am getting another flag regarding 'transcript_id' but I don't understand why? Any insight as to how I can modify the format to meet the criteria for proper .gtf input would be greatly appreciated.

 

Thank you,

A. Romney

 

error:

(env2)arom2:~:1015 > python /vol/apps/user/stow/R-3.2.1/lib64/R/library/DEXSeq/python_scripts/dexseq_prepare_annotation.py fhet_prep.gtf Fhet_final.gff
Traceback (most recent call last):
  File "/vol/apps/user/stow/R-3.2.1/lib64/R/library/DEXSeq/python_scripts/dexseq_prepare_annotation.py", line 55, in <module>
    exons[f.iv] += ( f.attr['gene_id'], f.attr['transcript_id'] )
KeyError: 'transcript_id'

Here is a sub-section of the gtf file I am using to show you how the gene_id and transcript_id are identified.

NW_012224401.1    Gnomon    exon    71785    71884    .    +    .    ID=id44;gene_id=rna2;Dbxref=GeneID:105915271,Genbank:XM_012849362.1;gbkey=mRNA;gene=csf1r;product=colony stimulating factor 1 receptor%2C transcript variant X1;transcript_id=XM_012849362.1
NW_012224401.1    Gnomon    exon    72804    72915    .    +    .    ID=id45;gene_id=rna2;Dbxref=GeneID:105915271,Genbank:XM_012849362.1;gbkey=mRNA;gene=csf1r;product=colony stimulating factor 1 receptor%2C transcript variant X1;transcript_id=XM_012849362.1
NW_012224401.1    Gnomon    exon    76564    76791    .    +    .    ID=id46;gene_id=rna2;Dbxref=GeneID:105915271,Genbank:XM_012849362.1;gbkey=mRNA;gene=csf1r;product=colony stimulating factor 1 receptor%2C transcript variant X1;transcript_id=XM_012849362.1
NW_012224401.1    Gnomon    mRNA    62183    76791    .    +    .    ID=rna3;gene_id=gene2;Dbxref=GeneID:105915271,Genbank:XM_012849437.1;Name=XM_012849437.1;gbkey=mRNA;gene=csf1r;product=colony stimulating factor 1 receptor%2C transcript variant X2;transcript_id=XM_012849437.1
NW_012224401.1    Gnomon    exon    62183    62306    .    +    .    ID=id47;gene_id=rna3;Dbxref=GeneID:105915271,Genbank:XM_012849437.1;gbkey=mRNA;gene=csf1r;product=colony stimulating factor 1 receptor%2C transcript variant X2;transcript_id=XM_012849437.1
NW_012224401.1    Gnomon    exon    62565    62634    .    +    .    ID=id48;gene_id=rna3;Dbxref=GeneID:105915271,Genbank:XM_012849437.1;gbkey=mRNA;gene=csf1r;product=colony stimulating factor 1 receptor%2C transcript variant X2;transcript_id=XM_012849437.1
NW_012224401.1    Gnomon    exon    63886    64173    .    +    .    ID=id49;gene_id=rna3;Dbxref=GeneID:105915271,Genbank:XM_012849437.1;gbkey=mRNA;gene=csf1r;product=colony stimulating factor 1 receptor%2C transcript variant X2;transcript_id=XM_012849437.1
NW_012224401.1    Gnomon    exon    64260    64547    .    +    .    ID=id50;gene_id=rna3;Dbxref=GeneID:105915271,Genbank:XM_012849437.1;gbkey=mRNA;gene=csf1r;product=colony stimulating factor 1 receptor%2C transcript variant X2;transcript_id=XM_012849437.1
NW_012224401.1    Gnomon    exon    64630    64766    .    +    .    ID=id51;gene_id=rna3;Dbxref=GeneID:105915271,Genbank:XM_012849437.1;gbkey=mRNA;gene=csf1r;product=colony stimulating factor 1 receptor%2C transcript variant X2;transcript_id=XM_012849437.1
NW_012224401.1    Gnomon    exon    64845    65010    .    +    .    ID=id52;gene_id=rna3;Dbxref=GeneID:105915271,Genbank:XM_012849437.1;gbkey=mRNA;gene=csf1r;product=colony stimulating factor 1 receptor%2C transcript variant X2;transcript_id=XM_012849437.1

 

DEXseq gff prepare annotation gtf transcript_id • 2.0k views
ADD COMMENT
0
Entering edit mode

Hi Rommey,

I am not sure what is exactly the problem with your annotation file. But I noticed that it has lots of fields, some of them with strange characters and spaces in the attribute values. The format 'attribute=value' is gff format, gtf format usually uses the 'attribute = "value"' format.

I manually modified your file (but of course can be written in a small script) to the shape below, and the DEXSeq script worked as expected!

NW_012224401.1    Gnomon    exon    71785    71884    .    +    .    gene_id rna2; transcript_id XM_012849362.1
NW_012224401.1    Gnomon    exon    72804    72915    .    +    .    gene_id rna2; transcript_id XM_012849362.1
NW_012224401.1    Gnomon    exon    76564    76791    .    +    .    gene_id rna2; transcript_id XM_012849362.1
NW_012224401.1    Gnomon    mRNA    62183    76791    .    +    .    gene_id gene2; transcript_id XM_012849437.1
NW_012224401.1    Gnomon    exon    62183    62306    .    +    .    gene_id rna3; transcript_id XM_012849437.1
NW_012224401.1    Gnomon    exon    62565    62634    .    +    .    gene_id rna3; transcript_id XM_012849437.1
NW_012224401.1    Gnomon    exon    63886    64173    .    +    .    gene_id rna3; transcript_id XM_012849437.1
NW_012224401.1    Gnomon    exon    64260    64547    .    +    .    gene_id rna3; transcript_id XM_012849437.1
NW_012224401.1    Gnomon    exon    64630    64766    .    +    .    gene_id rna3; transcript_id XM_012849437.1
NW_012224401.1    Gnomon    exon    64845    65010    .    +    .    gene_id rna3; transcript_id XM_012849437.1

ADD REPLY
0
Entering edit mode
arom2 • 0
@arom2-8204
Last seen 7.7 years ago
United States

Thanks Alejandro,

I was able to revise the NCBI annotation file I am using and can now also get it to work with the prep_annotation script.

Thanks NCBI...

ADD COMMENT

Login before adding your answer.

Traffic: 542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6