Coverage function assigns wrong weights for gapped alignments
2
1
Entering edit mode
rasi1983 ▴ 10
@rasi1983-7454
Last seen 6.1 years ago
United States

I am calculating weighted coverage for two reads one of which has a gap. I find that the coverage weights are assigned wrong after the gapped region due to the GAlignment being converted to GRangesList with multiple GRanges. Is there a way to calculate the right weighted coverage?

I can correct this manually by expanding the assigned weights by hand as shown below, but the lapply step seems to be very slow when working with millions of reads.

> library(GenomicAlignments)

> reads <- GAlignments(
seqnames=Rle(rep("chr1",2)),
pos=as.integer(c(2,40)),
strand=strand(rep("+",2)),
cigar=c("8M2N12M", "10M"),
weight=c(0.1, 0.2))

> reads
GAlignments object with 2 alignments and 1 metadata column:
      seqnames strand       cigar    qwidth     start       end     width
         <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
  [1]     chr1      +     8M2N12M        20         2        23        22
  [2]     chr1      +         10M        10        40        49        10
          njunc |    weight
      <integer> | <numeric>
  [1]         1 |       0.1
  [2]         0 |       0.2
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> coverage(reads)
RleList of length 1
$chr1
integer-Rle of length 49 with 6 runs
  Lengths:  1  8  2 12 16 10
  Values :  0  1  0  1  0  1


> coverage(reads, weight=mcols(reads)$weight)
RleList of length 1
$chr1
numeric-Rle of length 49 with 6 runs
  Lengths:   1   8   2  12  16  10
  Values :   0 0.1   0 0.2   0 0.1

Warning message:
In .recycle(arg, length(x), arg.label, "x") :
  'weight' length is not a divisor of 'x' length

#### correct above problem by manually assigning weights
> shape <- lapply(grlist(reads), function(x) rep(1, length(x)))
> weights <- unlist(mapply("*", shape, mcols(reads)$weight)
> coverage(reads, weight=weights)
RleList of length 1
$chr1
numeric-Rle of length 49 with 6 runs
  Lengths:   1   8   2  12  16  10
  Values :   0 0.1   0 0.1   0 0.2

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices datasets  utils    
[8] methods   base     

other attached packages:
 [1] GenomicAlignments_1.10.0   Rsamtools_1.26.1          
 [3] Biostrings_2.42.1          XVector_0.14.0            
 [5] SummarizedExperiment_1.4.0 Biobase_2.34.0            
 [7] GenomicRanges_1.26.3       GenomeInfoDb_1.10.3       
 [9] IRanges_2.8.1              S4Vectors_0.12.1          
[11] BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
[1] lattice_0.20-34    bitops_1.0-6       grid_3.3.2         zlibbioc_1.20.0   
[5] Matrix_1.2-8       BiocParallel_1.8.1 tools_3.3.2        RCurl_1.95-4.8    
[9] compiler_3.3.2
genomicalignments splicing coverage • 1.5k views
ADD COMMENT
0
Entering edit mode
@haroldpimentel-9160
Last seen 4.4 years ago

I came across the same issue. It still appears to not have been fixed.

Here is a adjustment to your solution that should be significantly faster:

    cig = cigar(reads)
    n_matches = lengths(regmatches(cig, gregexpr("M", cig)))
    shape = lapply(n_matches, function(y) rep(1, y))
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.3 years ago
United States

Seems like a bug in coverage,GRangesList(), because it should automatically expand the weights. For a relatively simple speed up, consider:

grl <- grglist(reads)
weight <- mcols(reads)$weight[togroup(PartitioningByEnd(grl))]
coverage(grl, weight)
ADD COMMENT

Login before adding your answer.

Traffic: 806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6