Question: Avoid rotating vector when doing arithmetic on different length IRanges RleList entries
0
gravatar for endrebak85
16 months ago by
endrebak8530
github.com/endrebak/
endrebak8530 wrote:


I want to deduct the coverage in one RleList from that of another. A - B. But the lengths of the coverages
aren't going to be exactly the same, which means that the vectors will be rotated. This is not what I want. For example, if the Rle in chr1 in A is longer than that in B I want to extend chr1 B with the length difference from A - B and value 0 to avoid rotation.

 

How do I do that?

Given RleLists chip and background I have tried stuff like

library(GenomicRanges)
library(data.table)

fc = "/mnt/scratch/endrebak/pyranges_benchmark/data/download/chip_1000000.bed.gz"
fb = "/mnt/scratch/endrebak/pyranges_benchmark/data/download/input_1000000.bed.gz"

print("Reading data table")
cmd = paste0("zcat ", fc, " | cut -f 1-3,6")
cmd_bg= paste0("zcat ", fb, " | cut -f 1-3,6")

chip_df = fread(cmd, header=FALSE, col.names=c("Chromosome", "Start", "End", "Strand"), stringsAsFactors=TRUE)
input_df = fread(cmd_bg, header=FALSE, col.names=c("Chromosome", "Start", "End", "Strand"), stringsAsFactors=TRUE)

print("creating granges")
chip = GRanges(seqnames = chip_df$Chromosome, ranges = IRanges(start = chip_df$Start, end = chip_df$End), strand=chip_df$Strand)
input = GRanges(seqnames = input_df$Chromosome, ranges = IRanges(start = input_df$Start, end = input_df$End), strand=input_df$Strand)

print("computing coverage")
chip = coverage(chip)
background = coverage(input)

for (n in names(chip)){

  sumc = sum(runLength(chip[n]))
  sumb = sum(runLength(background[n]))

  # to avoid rotation
  if (sumc > sumb){
    print(paste0("Extending bg for chromosome ", n, " with ", sumc - sumb, " nucleotides"))
    background[n] = c(background[n], Rle(0, sumc - sumb))
  } else if (sumb > sumc){
    print(paste0("Extending chip for chromosome ", n, " with ", sumb - sumc, " nucleotides"))
    chip[n] = c(chip[n], Rle(0, sumb - sumc))
  }

}

print(paste0("length chr1 chip ", sum(runLength(chip["chr1"]))))
print(paste0("length chr1 bg ", sum(runLength(background["chr1"]))))

However, this for-loop does not update the RleList. The final print-statements say

[1] "length chr1 chip 249240672"

[1] "length chr1 bg 249228314"

How can I achieve what I want?

 

iranges s4vectors rle rlelist • 222 views
ADD COMMENTlink modified 16 months ago • written 16 months ago by endrebak8530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 251 users visited in the last hour