Question

Gviz: Filtering AlignmentsTrack by read names

0

Entering edit mode

anikajohn • 0

@e0dc22d0

Last seen 3.3 years ago

Switzerland

Hi,

I am new to Gviz and it's lots of fun. Right now I am looking into visualising read alignments. Currently, using the code below, all reads of the bam file are displayed. However, I'd like to only display the alignment of one specific read (it's supposed to become a functionality within an app). Is there a way to filter specific reads by their id when using AlignmentsTrack. Or alternatively, is there a possibility to use GAlignments object with AlignmentsTrack (I can filter these easily).

alTrack <- AlignmentsTrack(alignment.bam, showIndels=TRUE, name="Indels")

plotTracks(alTrack,
           from = 1, 
           to = 12000, 
           chromosome = "ABC")

Thanks already for any advice!

Gvi Gviz • 1.9k views

ADD COMMENT • link updated 3.3 years ago by James W. MacDonald 68k • written 3.3 years ago by anikajohn • 0

0

Entering edit mode

I have some better reproducible example here:

library(Gviz)
library(GenomicAlignments)

###### How it works right now #######

afrom <- 2960000
ato <- 3160000
alTrack <- AlignmentsTrack(
  system.file(package = "Gviz", "extdata", "gapped.bam"),
  isPaired = TRUE)

#all reads are plotted
plotTracks(alTrack,from = afrom, to = ato, 
           chromosome = "chr12")

###### How I want it to work #######

#1. Filter the bam file, so that left with only one read
bamFile <- system.file(package = "Gviz", "extdata", "gapped.bam")
bamObject <- readGAlignments(bamFile)[1:5]
#give reads an example name, want to filter by it later
names(bamObject) <- c("read1","read2","read3","read4","read5")
bamObject <- bamObject[names(bamObject) == "read3"]

#2. display the alignment of only the one selected read (PSEUDO CODE)
alTrack <- AlignmentsTrack(
  bamObject,
  isPaired = TRUE)

plotTracks(alTrack,from = afrom, to = ato, 
           chromosome = "chr12")

Maybe that makes more obvious what I mean.

-Cheers

ADD REPLY • link 3.3 years ago anikajohn • 0

score 0 · Answer 1 · 2022-07-29

This is explained in the help page for AlignmentsTracks. As per most R help, everything you need is there, but it was written by someone who knew exactly how the machine works, which is not always the best situation. Here's the start of the relevant section:

 range: An optional meta argument to handle the different input
          types. If the 'range' argument is missing, all the relevant
          information to create the object has to be provided as
          individual function arguments (see below).

          The different input options for 'range' are:

A 'character' string: the path to a 'BAM' file containing the read
              alignments. To be precise, this will result in the
              instantiation of a 'ReferenceAlignmentsTrack' object, but
              for the user this implementation detail should be of no
              concern.

A 'GRanges' object: the genomic ranges of the individual reads as well
              as the optional additional metadata columns 'id',
              'cigar', 'mapq', 'flag', 'isize', 'groupid', 'status',
              'md' and 'seqs' (see description of the individual
              function parameters below for details). Calling the
              constructor on a 'GRanges' object without further
              arguments, e.g. 'AlignmentsTrack(range=obj)' is
              equivalent to calling the coerce method 'as(obj,
              "AlignmentsTrack")'.

Where the super-relevant part is the explanation of the GRanges part. But how would one get such a thing? You already tried the obvious. But as you can see that won't work because it doesn't return a GRanges object. But you can read in everything you need using scanBam and make the GRanges yourself.

> bfl <- BamFile(bamFile, asMates = TRUE)

## scanBam returns a list for each range in 'which'. We only used one range, so just get that list item
> z <- scanBam(bfl, param = ScanBamParam(which = GRanges("chr12:2960000-3160000"), what = scanBamWhat(), flag = scanBamFlag(isUnmappedQuery = FALSE)))[[1]]

## Now turn that into a GRanges object
> gr <- GRanges(z$rname, IRanges(z$pos, width = z$qwidth), z$strand, 
                id = z$qname, cigar = z$cigar, mapq = z$mapq, flag = z$flag, isize = z$isize, 
                groupid = z$groupid, status = z$mate_status)
> gr
GRanges object with 2244 ranges and 7 metadata columns:
         seqnames          ranges strand |                     id       cigar
            <Rle>       <IRanges>  <Rle> |            <character> <character>
     [1]    chr12 2968078-2968127      + | HWUSI-EAS1794_0001_F..         50M
     [2]    chr12 2968219-2968268      - | HWUSI-EAS1794_0001_F..         50M
     [3]    chr12 2986702-2986751      + | HWUSI-EAS1794_0001_F..         50M
     [4]    chr12 2986841-2986890      - | HWUSI-EAS1794_0001_F..         50M
     [5]    chr12 2967685-2967734      + | HWUSI-EAS1794_0001_F..         50M
     ...      ...             ...    ... .                    ...         ...
  [2240]    chr12 3152224-3152273      - | HWUSI-EAS1794_0001_F..         50M
  [2241]    chr12 2997921-2997970      - | HWUSI-EAS1794_0001_F..         50M
  [2242]    chr12 2973632-2973681      + | HWUSI-EAS1794_0001_F..  30M187N20M
  [2243]    chr12 2981406-2981455      + | HWUSI-EAS1794_0001_F..  8M1729N42M
  [2244]    chr12 2975582-2975631      + | HWUSI-EAS1794_0001_F..         50M
              mapq      flag     isize   groupid   status
         <integer> <integer> <integer> <integer> <factor>
     [1]       255        99         0         1    mated
     [2]       255       147         0         1    mated
     [3]       255        99         0         2    mated
     [4]       255       147         0         2    mated
     [5]       255        99         0         3    mated
     ...       ...       ...       ...       ...      ...
  [2240]       255       113         0      1166  unmated
  [2241]       255       177         0      1167  unmated
  [2242]       255       137      <NA>      1168  unmated
  [2243]       255        73      <NA>      1169  unmated
  [2244]       255       137      <NA>      1170  unmated
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Now to plot only one read, you need to know which read you want to plot. I have no idea how you would know that, but here's one random read.

alTrack2 <- AlignmentsTrack(gr[gr$id %in% gr$id[400]])

## and now you can plot that one read by doing

plotTracks(alTrack2, from = afrom, to = ato, chromosome = "chr12")