Question: Coverage function assigns wrong weights for gapped alignments
0
gravatar for rasi1983
2.5 years ago by
rasi19830
United States
rasi19830 wrote:

I am calculating weighted coverage for two reads one of which has a gap. I find that the coverage weights are assigned wrong after the gapped region due to the GAlignment being converted to GRangesList with multiple GRanges. Is there a way to calculate the right weighted coverage?

I can correct this manually by expanding the assigned weights by hand as shown below, but the lapply step seems to be very slow when working with millions of reads.

> library(GenomicAlignments)

> reads <- GAlignments(
seqnames=Rle(rep("chr1",2)),
pos=as.integer(c(2,40)),
strand=strand(rep("+",2)),
cigar=c("8M2N12M", "10M"),
weight=c(0.1, 0.2))

> reads
GAlignments object with 2 alignments and 1 metadata column:
      seqnames strand       cigar    qwidth     start       end     width
         <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
  [1]     chr1      +     8M2N12M        20         2        23        22
  [2]     chr1      +         10M        10        40        49        10
          njunc |    weight
      <integer> | <numeric>
  [1]         1 |       0.1
  [2]         0 |       0.2
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> coverage(reads)
RleList of length 1
$chr1
integer-Rle of length 49 with 6 runs
  Lengths:  1  8  2 12 16 10
  Values :  0  1  0  1  0  1


> coverage(reads, weight=mcols(reads)$weight)
RleList of length 1
$chr1
numeric-Rle of length 49 with 6 runs
  Lengths:   1   8   2  12  16  10
  Values :   0 0.1   0 0.2   0 0.1

Warning message:
In .recycle(arg, length(x), arg.label, "x") :
  'weight' length is not a divisor of 'x' length

#### correct above problem by manually assigning weights
> shape <- lapply(grlist(reads), function(x) rep(1, length(x)))
> weights <- unlist(mapply("*", shape, mcols(reads)$weight)
> coverage(reads, weight=weights)
RleList of length 1
$chr1
numeric-Rle of length 49 with 6 runs
  Lengths:   1   8   2  12  16  10
  Values :   0 0.1   0 0.1   0 0.2

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices datasets  utils    
[8] methods   base     

other attached packages:
 [1] GenomicAlignments_1.10.0   Rsamtools_1.26.1          
 [3] Biostrings_2.42.1          XVector_0.14.0            
 [5] SummarizedExperiment_1.4.0 Biobase_2.34.0            
 [7] GenomicRanges_1.26.3       GenomeInfoDb_1.10.3       
 [9] IRanges_2.8.1              S4Vectors_0.12.1          
[11] BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
[1] lattice_0.20-34    bitops_1.0-6       grid_3.3.2         zlibbioc_1.20.0   
[5] Matrix_1.2-8       BiocParallel_1.8.1 tools_3.3.2        RCurl_1.95-4.8    
[9] compiler_3.3.2
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by rasi19830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 137 users visited in the last hour