Hi,
I am running the DEXSeq (v. 1.12.1) with two different modi, one with the strand information and one without it. I just wanted to see the differences.
the run with strand information works fine, but the one without gives me the above error message.
I first ran the dexseq_prepare_annotation.py script, but modified it:
exons = HTSeq.GenomicArrayOfSets( "auto", stranded=False )
this gives me a gff file with no strand information, where the 7th column of the file is overall '.'
I than run the dexseq_count.py script with the gff file.
python dexseq_count.py -s no -f bam cDmel_NoStrandInfo_DEXSeq.gff $file
I now get the error message with the genes which are being aggreagated, due to the fact the the strand information is missing. I know I can use the -r no option, but I would like to count also these genes.
the first gene name I can't find is XLOC_000002 (these are cufflinks IDs). It is aggregated together with XLOC_001386.
The problem I have is that these genes are aggregated both in my newly created gff file
grep -Hrn "XLOC_000002" NoStrandInfo_DEXSeq.gff NoStrandInfo_DEXSeq.gff:27:2L dexseq_prepare_annotation_modified.py aggregate_gene 21842 25151 . . . gene_id "XLOC_001386+XLOC_000002" NoStrandInfo_DEXSeq.gff:28:2L dexseq_prepare_annotation_modified.py exonic_part 21842 21918 . . . transcripts "TCONS_00000004"; exonic_part_number "001"; gene_id "XLOC_001386+XLOC_000002" NoStrandInfo_DEXSeq.gff:29:2L dexseq_prepare_annotation_modified.py exonic_part 21919 22687 . . . transcripts "TCONS_00000004+TCONS_00003381"; exonic_part_number "002"; gene_id "XLOC_001386+XLOC_00000 " NoStrandInfo_DEXSeq.gff:30:2L dexseq_prepare_annotation_modified.py exonic_part 22688 22709 . . . transcripts "TCONS_00000004"; exonic_part_number "003"; gene_id "XLOC_001386+XLOC_000002" NoStrandInfo_DEXSeq.gff:31:2L dexseq_prepare_annotation_modified.py exonic_part 22743 22768 . . . transcripts "TCONS_00003381"; exonic_part_number "004"; gene_id "XLOC_001386+XLOC_000002" NoStrandInfo_DEXSeq.gff:32:2L dexseq_prepare_annotation_modified.py exonic_part 22769 22935 . . . transcripts "TCONS_00000004+TCONS_00003381"; exonic_part_number "005"; gene_id "XLOC_001386+XLOC_00000 " ...
as well as in the counts file:
grep -Hrn "XLOC_000002" cufflinks.txt cufflinks.txt:9237:XLOC_001386+XLOC_000002:001 0 cufflinks.txt:9238:XLOC_001386+XLOC_000002:002 18 cufflinks.txt:9239:XLOC_001386+XLOC_000002:003 1 cufflinks.txt:9240:XLOC_001386+XLOC_000002:004 1 cufflinks.txt:9241:XLOC_001386+XLOC_000002:005 1 ...
So I don't understand why they can't be found as a pair. Is there a different reason for this error message?
thanks
Assa
Why did you create a GFF without strand information?