Question: DEXSeq: Error running dexseq_prepare_annotation.py
0
8 months ago by
suzanne110 wrote:

Hello all,

I am trying to run dexseq_prepare_annotation.py, but keep receiving the same error:

File "dexseq_prepare_annotation.py", line 129

raise ValueError, "Same name found on two chromosomes: %s, %s" % ( str(l[i]), str(l[i+1]) )

My guess is the problem is inside the gtf file, which was downloaded from Ensembl (hg19).

Hopefully someone can help me figure this out! Many thanks in advance.

best,

Suzanne

dexseq • 274 views
modified 8 months ago by Alejandro Reyes1.6k • written 8 months ago by suzanne110
Answer: DEXSeq: Error running dexseq_prepare_annotation.py
0
8 months ago by
Alejandro Reyes1.6k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.6k wrote:

Hi Suzanne,

The error message indicates that the same gene ID was found in two different chromosomes. As you suspect this is likely an error in the annotation file. The easiest solution is to remove that gene identifier from the gtf file.

Best regards,
Alejandro

Dear Alejandro,

Thank you for your help.

To test, I created a new gtf file only containing information from chromosome 1. However, the error keeps appearing.. Does someone have any clue what is going on?

Best,

Suzanne

Hi Suzanne, have you checked if there are gene identifiers that appear in multiple chromosomes?

Hi,
I checked, and in the original file there were. As I am new to programming and didn't now an easy fix for this, I first tried by making a new GTF file with only the gene ids on chr1. So in this new file, there are no more gene identifiers that appear on multiple chromosomes (as it only contains chr1).
Still, I get the same error when running the script?

Best,

Suzanne

Oh I see, sorry I did not catch that earlier. Could you point me to the link to the gtf file that you are using?

The original gtf file I was using I downloaded from here:
ftp://ftp.ensembl.org/pub/grch37/update/gtf/homo_sapiens/

thanks!

I just ran the following without problems using python 2.7.10 and HTSeq 0.6.1p1:

curl -o Homo_sapiens.GRCh37.87.gtf.gz ftp://ftp.ensembl.org/pub/grch37/update/gtf/homo_sapiens//Homo_sapiens.GRCh37.87.gtf.gz gunzip Homo_sapiens.GRCh37.87.gtf.gz python ~/Work/Rpcks/DEXSeq/inst/python_scripts/dexseq_prepare_annotation.py Homo_sapiens.GRCh37.87.gtf Homo_sapiens.GRCh37.87.DEXSeq.gtf

There might be incompatibilities with the versions that you are using. What python and HTSeq versions that you are running?

ADD REPLYlink modified 8 months ago • written 8 months ago by Alejandro Reyes1.6k

I tried again with this file, but still the same error, so indeed something else might be the problem.
htseq: 0.10.0

numpy 1.14.5

pysam 0.14.1

python version: 3.5.5

Thanks for the details. I could reproduce the error message using python 3.6 and HTSeq 0.10.0: I think it is a bug in HTSeq. I'll report this to the maintainers of HTSeq. While this gets fixed, I would recommend  using python 2.7.

Alejandro

ps. just uploaded the resulting file here: https://www.dropbox.com/s/gaxwzkvdrvtz6cb/Homo_sapiens.GRCh37.87.DEXSeq.gtf?dl=0

ADD REPLYlink modified 8 months ago • written 8 months ago by Alejandro Reyes1.6k