I'm having trouble mapping from read coordinates to reference coordinates using GenomicAlignments::mapFromAlignments
when my reads contain soft-clipped bases (CIGAR = "S"). Please see the example below:
> first.ref.locs <- mapFromAlignments(first.ranges, GenomicAlignments::first(bam))
> cigar(GenomicAlignments::first(bam))[26]
[1] "27S73="
> mcols(GenomicAlignments::first(bam))$pos[26]
[1] 1
> first.ref.locs[1]
IRanges object with 1 range and 0 metadata columns:
start end width
<integer> <integer> <integer>
K00284:219:HGGG3BBXY:2:1101:18040:1209 69 69 1
> first.ranges[1]
IRanges object with 1 range and 0 metadata columns:
start end width
<integer> <integer> <integer>
K00284:219:HGGG3BBXY:2:1101:18040:1209 69 69 1
The first.ranges
object are the ranges I'm trying to map, and first.ref.locs
is the result of the mapping. So I'm trying to map the 69th base to the reference. The output is 69, but this is not correct. Because of the soft clip, the 28th base of the read is the 1st base of the reference. Therefore the 69th base of the read should be the 42nd base of the reference, but that's not what mapFromAlignments
outputs.
Simple example:
12345678901
ref ACGTACGTTAA
read GTTTCAGACGTACGTTAA
123456789012345678
In this case the CIGAR string would be 7S11=
, pos
would be 1. If I tried to map position 8 on the read to the reference, mapFromAlignments
would currently return 8. Why is this the desired behavior?
Is it possible that the pos
is the problem here? I aligned these reads with minimap2
. Should the pos
actually be negative when the read begins before the reference?
Thanks for any guidance you can provide, this question has been stumping me for a long time!
I wrote this up before seeing your answer and I have the adjustment for the soft clip coming after the
mapFromAlignments
call rather than before. From spot-checking some reads it seems to give the correct answer but I think it could give the wrong result if there's an indel between thepos
and the mapped location (x
), so I'll change it.Thanks a lot for your detailed answer!