Entering edit mode
hi all!!
how can i create gff from given gtf annotation file of UCSC of human genome h19?
i try to use the python script prepere_annotaion.py and got a lot of errors
any ideas???
ty
efrat
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
"i try to use the python script prepere_annotaion.py and got a lot of errors" - Can you be more specific?
Have you read the DEXSeq manual?
The gene_id attribute is used to see which exons belong to the same gene. It must be called gene_id (and not Parent as in GFF3 files, or GeneID as in some older GFF files), and it must give the same identifier to all exons from the same gene, even if they are from different transcripts of this gene. (This last requirement is not met by GTF files generated by the Table Browser function of the UCSC Genome Browser.)
hi thank you very much foe your answer
this is the command: python dexseq_prepare_annotation.py genes.gtf genes.gff
while the genes.gtf refer to gtf from hg19
this is the error i got:
File "dexseq_prepare_annotation.py", line 127, in <module>
assert l[i].iv.end <= l[i+1].iv.start, str(l[i+1]) + " starts too early"
AssertionError: <GenomicFeature: exonic_part 'CFB' at chr6_dbb_hap3: 3199308 -> 3199650 (strand '+')> starts too early
what i need to change in the gtf before im trying to convert it to gff?
ty again!
efrat
Hi Efrat,
You could try this to convert from GTF to GFF3:
Should work granted that
makeTxDbFromGFF()
doesn't choke on the GTF file, which sometimes happens with some exotic GTF files.Note that not all the attributes from the original GTF file will necessarily propagate but the core GTF attributes (
gene_id
,transcript_id
,exon_id
) will be used to generate the core GFF3 attributes (ID
,Parent
,Name
), hence the gene/transcript/exon hierarchical organization should be preserved. Hopefully that's all what matters from a DEXSeq point of view but I can't tell for sure...H.