DEXSeq error: Count files do not correspond to the flattened annotation file
0
0
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 6 weeks ago
Germany

Hi,

I am running the DEXSeq (v. 1.12.1) with two different modi, one with the strand information and one without it. I just wanted to see the differences.

the run with strand information works fine, but the one without gives me the above error message.

I first ran the dexseq_prepare_annotation.py script, but modified it:

exons = HTSeq.GenomicArrayOfSets( "auto", stranded=False )

this gives me a gff file with no strand information, where the 7th column of the file is overall '.'

I than run the dexseq_count.py script with the gff file.

python dexseq_count.py -s no -f bam cDmel_NoStrandInfo_DEXSeq.gff $file

I now get the error message with the genes which are being aggreagated, due to the fact the the strand information is missing. I know I can use the -r no option, but I would like to count also these genes.

the first gene name I can't find is XLOC_000002 (these are cufflinks IDs). It is aggregated together with XLOC_001386.

The problem I have is that these genes are aggregated both in my newly created gff file

grep -Hrn "XLOC_000002" NoStrandInfo_DEXSeq.gff
NoStrandInfo_DEXSeq.gff:27:2L     dexseq_prepare_annotation_modified.py   aggregate_gene  21842   25151   .       .       .       gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:28:2L     dexseq_prepare_annotation_modified.py   exonic_part     21842   21918   .       .       .       transcripts "TCONS_00000004"; exonic_part_number "001"; gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:29:2L     dexseq_prepare_annotation_modified.py   exonic_part     21919   22687   .       .       .       transcripts "TCONS_00000004+TCONS_00003381"; exonic_part_number "002"; gene_id "XLOC_001386+XLOC_00000 "
NoStrandInfo_DEXSeq.gff:30:2L     dexseq_prepare_annotation_modified.py   exonic_part     22688   22709   .       .       .       transcripts "TCONS_00000004"; exonic_part_number "003"; gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:31:2L     dexseq_prepare_annotation_modified.py   exonic_part     22743   22768   .       .       .       transcripts "TCONS_00003381"; exonic_part_number "004"; gene_id "XLOC_001386+XLOC_000002"
NoStrandInfo_DEXSeq.gff:32:2L     dexseq_prepare_annotation_modified.py   exonic_part     22769   22935   .       .       .       transcripts "TCONS_00000004+TCONS_00003381"; exonic_part_number "005"; gene_id "XLOC_001386+XLOC_00000 "
...

as well as in the counts file:

 grep -Hrn "XLOC_000002" cufflinks.txt
cufflinks.txt:9237:XLOC_001386+XLOC_000002:001        0
cufflinks.txt:9238:XLOC_001386+XLOC_000002:002        18
cufflinks.txt:9239:XLOC_001386+XLOC_000002:003        1
cufflinks.txt:9240:XLOC_001386+XLOC_000002:004        1
cufflinks.txt:9241:XLOC_001386+XLOC_000002:005        1
...

So I don't understand why they can't be found as a pair. Is there a different reason for this error message?

 

thanks

Assa

 

 

 

 

 

DEXSeq counts gff • 1.6k views
ADD COMMENT
0
Entering edit mode

Why did you create a GFF without strand information?

ADD REPLY

Login before adding your answer.

Traffic: 539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6