Hi,
I got two error messages while running DEXSeq_count.py
"Current read position is smaller than previous reads, is your alignment file properly sorted by position?"
"A chromosome that had finished to be processed before was found again in the alignment file, is your alignment file properly sorted by position?"
These are the codes I used.
For dexseq_prepare_annotation, I used human gtf from ensembl.
python dexseq_prepare_annotation.py /projects/b1036/REFERENCE/Homo_sapiens.GRCh38.84.gtf /projects/b1036/wung/Homo_sapiens.GRCh38.84.DEXSeq.gff
Then I generated sorted SAM files for the samples using STAR Aligner. For example,
STAR --runThreadN 22 --genomeDir /some/output/dir --readFilesCommand zcat --readFilesIn 15_004_R1.fastq.gz 15_004_R2.fastq.gz --outFileNamePrefix $OUTPUTDIR/15_004_sorted_ --outSAMtype SAM SortedByCoordinate
I repeated these steps for 6 RNA samples: 9_004, 11_006, 15_004, 10_005, 12_012, 16_005
Then I finally ran dexseq_count.py for these 6 samples, using the gff file created from dexseq_prepared_annotation.py and the SAM file generated from STAR aligner
python dexseq_count.py Homo_sapiens.GRCh38.84.gtf -p yes -r pos 15_004_sorted_Aligned.out.sam 15_004_count.txt
And similarly for other samples (9_004, 11_006, ...)
However, I get error messages
"Current read position is smaller than previous reads, is your alignment file properly sorted by position?"
"A chromosome that had finished to be processed before was found again in the alignment file, is your alignment file properly sorted by position?"
At first I thought this meant the SAM files are not sorted, but I specified output file type to be SAM file sorted by coordinate when running STAR aligner, so I am lost.
Are there any mistakes in the steps I took?
Thank you so much in advance,
Wung Jae Lee
Hi Wung,
Did you manually check if your files were actually sorted by position?
Alejandro
Can't exclude the possibility of a bug in STAR. What happens if you try to sort your BAM files with samtools instead of using STAR's SortedByCoordinate option?
H.