Search
Question: DEXSeq prepare annotation on Stringtie outputGTF - KeyError: 'gene_id'
0
gravatar for craigmonger
5 months ago by
craigmonger10
craigmonger10 wrote:

I am trying to use DEXSeq on the gtf output from stringtie but am getting an error when trying to create the flattened annotation file. Please find below an example of my commands and input data. Thanks in advance for any help with this issue.

The command I am using is:

python dexseq_prepare_annotation.py merged.gtf dexseq.gtf

Which errors with:

Traceback (most recent call last):
  File "dexseq_prepare_annotation.py", line 54, in <module>

    f.attr['gene_id'] = f.attr['gene_id'].replace( ":", "_" )
KeyError: 'gene_id'

An example of what the gtf output from stringtie looks like is:

NW_006854668.1  StringTie       transcript      2       301     1000    -       .       gene_id "MSTRG.195"; transcript_id "MSTRG.195.1";

NW_006854668.1  StringTie       exon    2       301     1000    -       .       gene_id "MSTRG.195"; transcript_id "MSTRG.195.1"; exon_number "1";
NW_006854862.1  StringTie       transcript      1       316     1000    -       .       gene_id "MSTRG.196"; transcript_id "MSTRG.196.1";
NW_006854862.1  StringTie       exon    1       316     1000    -       .       gene_id "MSTRG.196"; transcript_id "MSTRG.196.1"; exon_number "1";
NW_006854920.1  StringTie       transcript      6       317     1000    -       .       gene_id "MSTRG.197"; transcript_id "MSTRG.197.1";
NW_006854920.1  StringTie       exon    6       317     1000    -       .       gene_id "MSTRG.197"; transcript_id "MSTRG.197.1"; exon_number "1";
NW_006855052.1  StringTie       transcript      1       307     1000    +       .       gene_id "MSTRG.198"; transcript_id "rna3"; gene_name "LOC103159057"; ref_gene_id "gene5";
NW_006855052.1  StringTie       exon    1       107     1000    +       .       gene_id "MSTRG.198"; transcript_id "rna3"; exon_number "1"; gene_name "LOC103159057"; ref_gene_id "gene5";
NW_006855052.1  StringTie       exon    197     307     1000    +       .       gene_id "MSTRG.198"; transcript_id "rna3"; exon_number "2"; gene_name "LOC103159057"; ref_gene_id "gene5";

 

ADD COMMENTlink modified 5 months ago • written 5 months ago by craigmonger10
2
gravatar for craigmonger
5 months ago by
craigmonger10
craigmonger10 wrote:

I think this problem was caused by a GTF file not adhering to standards.

Solved this issue by using the script posted by Persistent LABS here https://www.biostars.org/p/224372/ to rewrite the gene lines into the GTF file.

This however caused some transcript lines to be duplicated because of the presence of both a "gene_id" and "ref_gene_id" attribute.

This was solved using gffread with the -E and -T flags to rewrite the GTF file (once again causing the loss of the gene and transcript lines from the GTF file, so the above mentioned script was used a second time to rewrite them).

The gtf file then worked with dexseq_prepare_annotation.

 

Hope this helps anyone who encounters a similar issue!

ADD COMMENTlink written 5 months ago by craigmonger10

Hi,

Thanks for your answers. I am have the same problem in DEXseq. 
Firstly, since I don't have the gtf file of honey bee, I use gffread to convert the gff file to the gtf file. however, I had the same problem 

Then, I followed your comment and I convert back gff file:

./gffread -E /Users/LQS/Desktop/Apis_4.5_QS.gtf -o- > /Users/LQS/Desktop/DEXseq_2nd_Apis45.gff3

After that, I stored your perl script in R studio and changed the directory of the input file.

The input file was the one strored in .txt file format and specified the directory of the output gtf file which I want to use for the dexseq_prepare_annotation

I did like this:

open(F,"/Users/LQS/Desktop/DEXseq_2nd_Apis45.txt"); # Input file open(F1,">/Users/LQS/Desktop/data.gtf"); # Output gtf file

after running: perl XXX.pl 
I used the data.gtf for the dexseq_annotation_prepare, the same error happened again:

Traceback (most recent call last): File "/pub40/acdguest/Qiushi/scripts/DEXseq_script/dexseq_prepare_annotation.py", line 54, in <module> f.attr['gene_id'] = f.attr['gene_id'].replace( ":", "_" ) KeyError: 'gene_id'

Could you please help me on this?? Many thanks in advance.

Qiushi Liu

ADD REPLYlink written 4 months ago by 5140289290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 174 users visited in the last hour