Hello all,
I am trying to run dexseq_prepare_annotation.py, but keep receiving the same error:
File "dexseq_prepare_annotation.py", line 129
raise ValueError, "Same name found on two chromosomes: %s, %s" % ( str(l[i]), str(l[i+1]) )
My guess is the problem is inside the gtf file, which was downloaded from Ensembl (hg19).
Hopefully someone can help me figure this out! Many thanks in advance.
best,
Suzanne
Dear Alejandro,
Thank you for your help.
To test, I created a new gtf file only containing information from chromosome 1. However, the error keeps appearing.. Does someone have any clue what is going on?
Best,
Suzanne
Hi Suzanne, have you checked if there are gene identifiers that appear in multiple chromosomes?
Hi,
I checked, and in the original file there were. As I am new to programming and didn't now an easy fix for this, I first tried by making a new GTF file with only the gene ids on chr1. So in this new file, there are no more gene identifiers that appear on multiple chromosomes (as it only contains chr1).
Still, I get the same error when running the script?
Best,
Suzanne
Oh I see, sorry I did not catch that earlier. Could you point me to the link to the gtf file that you are using?
The original gtf file I was using I downloaded from here:
ftp://ftp.ensembl.org/pub/grch37/update/gtf/homo_sapiens/
thanks!
I just ran the following without problems using python 2.7.10 and HTSeq 0.6.1p1:
curl -o Homo_sapiens.GRCh37.87.gtf.gz ftp://ftp.ensembl.org/pub/grch37/update/gtf/homo_sapiens//Homo_sapiens.GRCh37.87.gtf.gz
gunzip Homo_sapiens.GRCh37.87.gtf.gz
python ~/Work/Rpcks/DEXSeq/inst/python_scripts/dexseq_prepare_annotation.py Homo_sapiens.GRCh37.87.gtf Homo_sapiens.GRCh37.87.DEXSeq.gtf
There might be incompatibilities with the versions that you are using. What python and HTSeq versions that you are running?
I tried again with this file, but still the same error, so indeed something else might be the problem.
htseq: 0.10.0
numpy 1.14.5
pysam 0.14.1
python version: 3.5.5
Thanks for the details. I could reproduce the error message using python 3.6 and HTSeq 0.10.0: I think it is a bug in HTSeq. I'll report this to the maintainers of HTSeq. While this gets fixed, I would recommend using python 2.7.
Alejandro
ps. just uploaded the resulting file here: https://www.dropbox.com/s/gaxwzkvdrvtz6cb/Homo_sapiens.GRCh37.87.DEXSeq.gtf?dl=0