Hi,
I was wondering why dexseq gives me so many warnings. I understand I can ignore them, as it has something to do with the order of the reads in the bam file.
I'm counting a paired-end bam file, whaich was mapped with hisat2.
the workflows was as followed:
> hisat2 -p 15 --un-gz $outhisat2/$base.unpaired.hisat2.gz -x ~/genomes/Homo_sapiens/HISAT2/grch38/genome -1 $base\_1.fastq.gz -2 $base\_2.fastq.gz | samtools view -Sbhu -F 4 - | samtools sort -@10 - $outhisat2/$base.sorted > samtools index $file > python ~/R/x86_64-pc-linux-gnu-library/3.2/DEXSeq/python_scripts/dexseq_count.py -f bam -p yes -r pos -s no Homo_sapiens.GRCh38.81.txt $file $NEW_FILE.dexseq.txt 2> $NEW_FILE.warnings
I can see from looking at the file, that it is not sorted as i expected it to be (a snippet of the bam file is below).
I am running the script and get millions of warnings. Is there a reason why the sorting didn't went as planned?
Do I need to re-run the sorting without the -F4
option in samtools to make sure that I get a correctly sorted file?
Would it be wiser to run the dexseq_count script in a single-end mode to prevent the accumulation of so many warnings?
>samtools view 704G6AAXX_8.sorted.bam | head
SOLEXA7_1:8:15:18443:12235/2 401 1 10541 1 50M 12 50738 0 CCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCG DBCBDDDCDCDC@DCCCCCCDDCCCCCCCCCCCCCCCCCCCCCCCCCCCC AS:i:-5 ZS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1MD:Z:19C30 YT:Z:UP NH:i:5
SOLEXA7_1:8:57:6309:11195/1 369 1 10543 1 50M 9 138243671 0 GAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGG CCDCDCBCDCCCCCCCCCCDCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC AS:i:-5 ZS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0NM:i:1 MD:Z:17C32 YT:Z:UP NH:i:3
SOLEXA7_1:8:68:8624:5173/1 345 1 10561 1 50M = 10561 0 AACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACG D7B@BBBDBBD@BADBDCDDCCDCDCDCCCCCCCCCCCCCCCCCCCCCCC AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0MD:Z:50 YT:Z:UP NH:i:2
SOLEXA7_1:8:107:4334:2565/2 433 1 10561 1 1S49M 7 128617866 0 GAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAAC #####A;>B=BBBDBD>@BDDC@DBBBBCCCCCCCCCCCCCCCCCCCCCC AS:i:-1 ZS:i:-1 XN:i:0 XM:i:0 XO:i:0 XG:i:0NM:i:0 MD:Z:49 YT:Z:UP NH:i:3
SOLEXA7_1:8:73:6319:19644/1 113 1 10562 255 50M 7 45773648 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGC BC@CCCCCAC@C?<CBCCCC@CCCBCC@C@CACCCCCCCBCCCCCBCCBC AS:i:0 ZS:i:-1 XN:i:0 XM:i:0 XO:i:0 XG:i:0NM:i:0 MD:Z:50 YT:Z:UP NH:i:1
SOLEXA7_1:8:57:13291:19688/1 113 1 10563 255 50M 9 138243618 0 CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCA CCCCCCBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC AS:i:0 ZS:i:-2 XN:i:0 XM:i:0 XO:i:0 XG:i:0NM:i:0 MD:Z:50 YT:Z:UP NH:i:1
SOLEXA7_1:8:65:4865:16987/2 177 1 10565 255 50M 3 198180758 0 CAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAAC CCCCDCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC AS:i:0 ZS:i:-4 XN:i:0 XM:i:0 XO:i:0 XG:i:0NM:i:0 MD:Z:50 YT:Z:UP NH:i:3
SOLEXA7_1:8:99:3043:3519/1 369 1 11154 0 46M4S 4 118595917 0 AGCCGGGCTCGGCCGCCCCTTGCTTGCAGCCGGGCACTACAGGACCGGCT #####?3B1BB5>AAB>@DDDCACBDCDCCCCCCCDCCDCCCCCCDCCCC AS:i:-13 ZS:i:-13 XN:i:0 XM:i:2XO:i:0 XG:i:0 NM:i:2 MD:Z:7A3A34 YT:Z:UP NH:i:5
SOLEXA7_1:8:29:15490:9805/1 329 1 11474 1 50M = 11474 0 AACACCCGGAGCATATGCTGTTTGGTCTCAGTAGACTCCTAAATATGGGA CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0MD:Z:50 YT:Z:UP NH:i:4
SOLEXA7_1:8:24:17498:6074/2 163 1 11641 1 50M = 11768 177 GTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDCCCCCCCCCCDC@CCCC AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0MD:Z:50 YS:i:0 YT:Z:CP NH:i:5
Hi Assa,
Can you include the warning that you are getting?
Alejandro
Hi Alejandro,
here is one row of my warnings file:
I have ran the same command with the option
'-p yes'
and with'-p no'
. in the paired-end mode I get ~109.5 million warnings, in the single-end mode I get only 540 warnings.Even worse is, when I check the output of the PE mode, I get no reads attached to any exon. the 338906 exon found are ambigious and the rest of the list consists of
'0'
.Here is also the last few lines of the count output file to show the differences in the results of the two counting modes
=> 43019516 reads were attached to an exon.