Search
Question: DEXSeq: Error running dexseq_prepare_annotation.py
0
gravatar for suzanne11
4 weeks ago by
suzanne110
suzanne110 wrote:

Hello all,

I am trying to run dexseq_prepare_annotation.py, but keep receiving the same error:

  File "dexseq_prepare_annotation.py", line 129

    raise ValueError, "Same name found on two chromosomes: %s, %s" % ( str(l[i]), str(l[i+1]) )

My guess is the problem is inside the gtf file, which was downloaded from Ensembl (hg19).

Hopefully someone can help me figure this out! Many thanks in advance.

best, 

Suzanne

 

ADD COMMENTlink modified 4 weeks ago by Alejandro Reyes1.6k • written 4 weeks ago by suzanne110
0
gravatar for Alejandro Reyes
4 weeks ago by
Alejandro Reyes1.6k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.6k wrote:

Hi Suzanne, 

The error message indicates that the same gene ID was found in two different chromosomes. As you suspect this is likely an error in the annotation file. The easiest solution is to remove that gene identifier from the gtf file. 

Best regards,
Alejandro

ADD COMMENTlink written 4 weeks ago by Alejandro Reyes1.6k

Dear Alejandro,

Thank you for your help.

To test, I created a new gtf file only containing information from chromosome 1. However, the error keeps appearing.. Does someone have any clue what is going on?

Best,

Suzanne

ADD REPLYlink written 4 weeks ago by suzanne110

Hi Suzanne, have you checked if there are gene identifiers that appear in multiple chromosomes?

ADD REPLYlink written 4 weeks ago by Alejandro Reyes1.6k

Hi,
I checked, and in the original file there were. As I am new to programming and didn't now an easy fix for this, I first tried by making a new GTF file with only the gene ids on chr1. So in this new file, there are no more gene identifiers that appear on multiple chromosomes (as it only contains chr1). 
Still, I get the same error when running the script?

Best,

Suzanne

ADD REPLYlink written 4 weeks ago by suzanne110

Oh I see, sorry I did not catch that earlier. Could you point me to the link to the gtf file that you are using?

ADD REPLYlink written 4 weeks ago by Alejandro Reyes1.6k

The original gtf file I was using I downloaded from here:
ftp://ftp.ensembl.org/pub/grch37/update/gtf/homo_sapiens/ 

 

thanks! 

ADD REPLYlink written 4 weeks ago by suzanne110

I just ran the following without problems using python 2.7.10 and HTSeq 0.6.1p1:

curl -o Homo_sapiens.GRCh37.87.gtf.gz ftp://ftp.ensembl.org/pub/grch37/update/gtf/homo_sapiens//Homo_sapiens.GRCh37.87.gtf.gz
gunzip Homo_sapiens.GRCh37.87.gtf.gz
python ~/Work/Rpcks/DEXSeq/inst/python_scripts/dexseq_prepare_annotation.py Homo_sapiens.GRCh37.87.gtf Homo_sapiens.GRCh37.87.DEXSeq.gtf

There might be incompatibilities with the versions that you are using. What python and HTSeq versions that you are running?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Alejandro Reyes1.6k

I tried again with this file, but still the same error, so indeed something else might be the problem.
htseq: 0.10.0

numpy 1.14.5

pysam 0.14.1

python version: 3.5.5

ADD REPLYlink written 4 weeks ago by suzanne110

Thanks for the details. I could reproduce the error message using python 3.6 and HTSeq 0.10.0: I think it is a bug in HTSeq. I'll report this to the maintainers of HTSeq. While this gets fixed, I would recommend  using python 2.7. 

Alejandro

ps. just uploaded the resulting file here: https://www.dropbox.com/s/gaxwzkvdrvtz6cb/Homo_sapiens.GRCh37.87.DEXSeq.gtf?dl=0  

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Alejandro Reyes1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 323 users visited in the last hour