DEXSeq: Error running dexseq_prepare_annotation.py
1
0
Entering edit mode
suzanne11 • 0
@suzanne11-16459
Last seen 5.8 years ago

Hello all,

I am trying to run dexseq_prepare_annotation.py, but keep receiving the same error:

  File "dexseq_prepare_annotation.py", line 129

    raise ValueError, "Same name found on two chromosomes: %s, %s" % ( str(l[i]), str(l[i+1]) )

My guess is the problem is inside the gtf file, which was downloaded from Ensembl (hg19).

Hopefully someone can help me figure this out! Many thanks in advance.

best, 

Suzanne

 

dexseq • 1.6k views
ADD COMMENT
0
Entering edit mode
Alejandro Reyes ★ 1.9k
@alejandro-reyes-5124
Last seen 13 hours ago
Novartis Institutes for BioMedical Rese…

Hi Suzanne, 

The error message indicates that the same gene ID was found in two different chromosomes. As you suspect this is likely an error in the annotation file. The easiest solution is to remove that gene identifier from the gtf file. 

Best regards,
Alejandro

ADD COMMENT
0
Entering edit mode

Dear Alejandro,

Thank you for your help.

To test, I created a new gtf file only containing information from chromosome 1. However, the error keeps appearing.. Does someone have any clue what is going on?

Best,

Suzanne

ADD REPLY
0
Entering edit mode

Hi Suzanne, have you checked if there are gene identifiers that appear in multiple chromosomes?

ADD REPLY
0
Entering edit mode

Hi,
I checked, and in the original file there were. As I am new to programming and didn't now an easy fix for this, I first tried by making a new GTF file with only the gene ids on chr1. So in this new file, there are no more gene identifiers that appear on multiple chromosomes (as it only contains chr1). 
Still, I get the same error when running the script?

Best,

Suzanne

ADD REPLY
0
Entering edit mode

Oh I see, sorry I did not catch that earlier. Could you point me to the link to the gtf file that you are using?

ADD REPLY
0
Entering edit mode

The original gtf file I was using I downloaded from here:
ftp://ftp.ensembl.org/pub/grch37/update/gtf/homo_sapiens/ 

 

thanks! 

ADD REPLY
0
Entering edit mode

I just ran the following without problems using python 2.7.10 and HTSeq 0.6.1p1:

curl -o Homo_sapiens.GRCh37.87.gtf.gz ftp://ftp.ensembl.org/pub/grch37/update/gtf/homo_sapiens//Homo_sapiens.GRCh37.87.gtf.gz
gunzip Homo_sapiens.GRCh37.87.gtf.gz
python ~/Work/Rpcks/DEXSeq/inst/python_scripts/dexseq_prepare_annotation.py Homo_sapiens.GRCh37.87.gtf Homo_sapiens.GRCh37.87.DEXSeq.gtf

There might be incompatibilities with the versions that you are using. What python and HTSeq versions that you are running?

ADD REPLY
0
Entering edit mode

I tried again with this file, but still the same error, so indeed something else might be the problem.
htseq: 0.10.0

numpy 1.14.5

pysam 0.14.1

python version: 3.5.5

ADD REPLY
0
Entering edit mode

Thanks for the details. I could reproduce the error message using python 3.6 and HTSeq 0.10.0: I think it is a bug in HTSeq. I'll report this to the maintainers of HTSeq. While this gets fixed, I would recommend  using python 2.7. 

Alejandro

ps. just uploaded the resulting file here: https://www.dropbox.com/s/gaxwzkvdrvtz6cb/Homo_sapiens.GRCh37.87.DEXSeq.gtf?dl=0  

ADD REPLY

Login before adding your answer.

Traffic: 564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6