mapToAlignments_issue
1
0
Entering edit mode
@davidporubsky-15691
Last seen 22 days ago
United States

Dear Bioc-team,

I'm trying to map genome coordinates back to local (contig/read) coordinates, however, I have an issue to do this properly in cases when alignment between contig and the genome (reference) is in minus (reverse) orientation. Is there a way to use function GenomicAlignments::mapToAlignments in strand aware fashion such that reverse alignments will be mapped in minus orientation?

Is there an example code on how to do this properly?

David

GenomicAlignments • 231 views
1
Entering edit mode

Do you have example code showing what you are trying to do and how it doesn't work the way you expect? Ideally a self-contained example so others can reproduce themselves?

0
Entering edit mode

Example alignment between query (Q) and target (T) sequences in PAF format.

paf.aln <- dplyr::tibble( q.name='Q', q.len=100, q.start=1, q.end=100, strand='*', t.name='T', t.len=1000, t.start=700, t.end=800, n.match=100, aln.len=100, mapq=60, cg='100=')

Target region to be lifted to the query

lift.gr <- as('T:701-710', 'GRanges')

Setting GAlignments object in positive orientation (using PAF alignment)

plus.alignment <- GenomicAlignments::GAlignments(seqnames = paf.aln$t.name, pos = as.integer(paf.aln$t.start + 1), cigar = paf.aln$cg, strand=GenomicRanges::strand('+'), names = paf.aln$q.name)

Lifting target region to query coordinates

plus.gr <- GenomicAlignments::mapToAlignments(x = lift.gr, alignments = plus.alignment)

Setting GAlignments object in negative orientation (using PAF alignment)

minus.alignment <- GenomicAlignments::GAlignments(seqnames = paf.aln$t.name, pos = as.integer(paf.aln$t.start + 1), cigar = paf.aln$cg, strand=GenomicRanges::strand('-'), names = paf.aln$q.name)

Lifting target region to query coordinates

minus.gr <- GenomicAlignments::mapToAlignments(x = lift.gr, alignments = minus.alignment)

Here both ranges lifted to query coordinates are the same position 1-10 of the query. I would expect in case of reverse alignment lifted ranges to correspond to position 91-100 of the query sequence. Am I doing something wrong here?

0
Entering edit mode

Any help here from the developers? I don't think this is a correct behavior from the 'mapToAlignments' function. I think it is important to make this mapping in strand aware fashion to obtain correct coordinates.

2
Entering edit mode
@herve-pages-1542
Last seen 4 hours ago
Seattle, WA, United States

I would expect in case of reverse alignment lifted ranges to correspond to position 91-100 of the query sequence.

Nope. Local coordinates are _always_ w.r.t. the leftmost mapped position, whatever the strand. This is consistent with the SAM Format Specification:

All mapped segments in alignment lines are represented on the forward genomic strand. For segments that have been mapped to the reverse strand, the recorded SEQ is reverse complemented from the original unmapped sequence and CIGAR, QUAL, and strand-sensitive optional fields are reversed and thus recorded consistently with the sequence bases as represented.

In other words, local coordinates are reported with respect to the SEQ that is recorded in the SAM/BAM file. For queries that are mapped to the reverse strand, this means that local coordinates are reported with respect to the _reverse complement_ of the original unmapped sequence.

You can "revert" those local coordinates with something like this:

reverse_mapped_coordinates <- function(mapped, alignments)
{
stopifnot(is(mapped, "GRanges"), is(alignments, "GAlignments"))
alignments_hits <- mcols(mapped)\$alignmentsHits
alignments2 <- alignments[alignments_hits]  # parallel to 'mapped'
mapped_to_minus <- which(strand(alignments2) == "-")
bounds <- IRanges(1L, qwidth(alignments2)[mapped_to_minus])
ranges(mapped)[mapped_to_minus] <-
reflect(ranges(mapped)[mapped_to_minus], bounds)
mapped
}


Then:

library(GenomicAlignments)

x <- GRanges("T:701-710")

read1 <- GAlignments(seqnames="T", pos=701L, cigar="100M", strand=strand("+"), names="Q")
read2 <- GAlignments(seqnames="T", pos=701L, cigar="100M", strand=strand("-"), names="Q")

# GRanges object with 2 ranges and 2 metadata columns:
#       seqnames    ranges strand |     xHits alignmentsHits
#          <Rle> <IRanges>  <Rle> | <integer>      <integer>
#   [1]        Q      1-10      * |         1              1
#   [2]        Q    91-100      * |         1              2
#   -------
#   seqinfo: 1 sequence from an unspecified genome; no seqlengths


Hope this helps,

H.