Search
Question: Editing of GAlignments
1
3.8 years ago by
gen30
United States
gen30 wrote:

Hi. say I have a GAlignments Object of several alignments and 5 metadata columns.


GAlignments object with 3 alignments and 5 metadata columns:
seqnames strand       cigar    qwidth     start       end     width
<Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
[1]     seq1      +         35M        35       970      1004        35
[2]     seq1      +         35M        35       971      1005        35
[3]     seq1      +         35M        35       972      1006        35
njunc |    rname   strand       pos    qwidth
<integer> | <factor> <factor> <integer> <integer>
[1]         0 |     seq1        +       970        35
[2]         0 |     seq1        +       971        35
[3]         0 |     seq1        +       972        35
seq
<DNAStringSet>
[1] TATTAGGAAATGCTTTACTGTCATAACTATGAAGA
[2] ATTAGGAAATGCTTTACTGTCATAACTATGAAGAG
[3] TTAGGAAATGCTTTACTGTCATAACTATGAAGAGA
-------
seqinfo: 2 sequences from an unspecified genome


 >     Is there a way to delete a single read from the object? I know I can subset it away by something like bamobject2<-bamobject[1:10] to get rid of bamobject[11] for example but I was wondering if there was a way to directly delete.   I can access the start of a single read through start(bamobject[1]) but how would you be able to change the actual value?   Thanks

modified 3.8 years ago by Martin Morgan ♦♦ 22k • written 3.8 years ago by gen30
0
3.8 years ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

I'm not exactly sure what you are hoping to do with 'delete a single read from the object'; the R way of doing things is what you've already shown, which is to subset and re-assign; negative subscripts are allowed, e.g., bamobject[-11]

For editing GAlignments objects, I'm not 100% sure about this but I believe the intention is that these are primarily 'read-only' (the aligner has given you 'the truth'). Nonetheless, you can extract and manipulate the GRanges, then re-create the GAlignments, e.g., after running example(GAlignments) I have an object gal; to shift start coordinates by 1, illustrating the 'round trip' from GAlignments to GAlignments, is

as(shift(granges(gal, use.mcols=TRUE), 1), "GAlignments")

1

Note that an important difference between a GAlignments and a GRanges object is that the former represents alignments (i.e. ranges + CIGARs), whereas the latter only represents ranges. More precisely a range on the reference sequence is generally not enough to describe an alignment: another important aspect of an alignment is its geometry. In the BAM/SAM world, and for GAlignments objects, the geometry of an alignment is described by its CIGAR. So roughly speaking, a GAlignments object can be seen as a GRanges object + the additional CIGAR information for each range. Now when you coerce from GAlignments to GRanges, you loose the CIGAR. In other words you just en up with a single range per alignment and you've lost its exact geometry. When you coerce from GRanges back to GAlignments, that information cannot be restored. So what this coercion does is assign a "fake" CIGAR to each alignment that describes the simplest type of alignment, that is, an alignment with no indels (I or D) or skipped regions (N) or soft or hard clipping (S or H). All this to say that there isn't really much value in going from GRanges back to GAlignments: the original CIGARs are lost anyway and the GAlignments object you end up with doesn't contain more information than the GRanges object.

H.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Hervé Pagès ♦♦ 13k

Any suggestions then for the original question of editing the GAlignments, which I guess is related to removing reads from coverage() calculation. ?

ADD REPLYlink written 3.8 years ago by Martin Morgan ♦♦ 22k

So yes reads can be removed from a GAlignments object by just subsetting the object, but core components of a GAlignments object like the start, end, width, and cigar cannot be modified. This is because the components describing the range of an alignment (stat, end, width) are tightly coupled to its cigar. For example modifying the start would also require to modify the cigar. Even an operation as simple as shift() that would preserve the width of an alignment would "break" the cigar in the sense that the original cigar becomes meaningless (not to say wrong) when the alignment is moved to the new position. The only operations we support that alter the start, end, width, and cigar of a GAlignments object are narrow() and qnarrow() (see ?GenomicAlignments::narrow). These are specific intra-range transformations where we actually know what to do with the cigar, that is, we know how to alter the start/end/width and the cigar together so that the modified cigar is an accurate description of the geometry of the narrowed alignment. Maybe we could support other specific intra-range transformations like that. Maybe the OP has a use case where s/he needs a specific intra-range transformation that we could support. But IMO, we should probably avoid having start, end, width, and cigar setters in order to support generic editing of a GAlignments object.

Finally note that the user can always use the GAlignments() constructor function to make a GAlignments object with the content of her choice. This actually provides an easy workaround for the lack of  start, end, width, and cigar setters.

H.

ADD REPLYlink written 3.8 years ago by Hervé Pagès ♦♦ 13k