Avoid rotating vector when doing arithmetic on different length IRanges RleList entries
Entering edit mode
endrebak85 ▴ 30
Last seen 4.1 years ago

I want to deduct the coverage in one RleList from that of another. A - B. But the lengths of the coverages
aren't going to be exactly the same, which means that the vectors will be rotated. This is not what I want. For example, if the Rle in chr1 in A is longer than that in B I want to extend chr1 B with the length difference from A - B and value 0 to avoid rotation.


How do I do that?

Given RleLists chip and background I have tried stuff like


fc = "/mnt/scratch/endrebak/pyranges_benchmark/data/download/chip_1000000.bed.gz"
fb = "/mnt/scratch/endrebak/pyranges_benchmark/data/download/input_1000000.bed.gz"

print("Reading data table")
cmd = paste0("zcat ", fc, " | cut -f 1-3,6")
cmd_bg= paste0("zcat ", fb, " | cut -f 1-3,6")

chip_df = fread(cmd, header=FALSE, col.names=c("Chromosome", "Start", "End", "Strand"), stringsAsFactors=TRUE)
input_df = fread(cmd_bg, header=FALSE, col.names=c("Chromosome", "Start", "End", "Strand"), stringsAsFactors=TRUE)

print("creating granges")
chip = GRanges(seqnames = chip_df$Chromosome, ranges = IRanges(start = chip_df$Start, end = chip_df$End), strand=chip_df$Strand)
input = GRanges(seqnames = input_df$Chromosome, ranges = IRanges(start = input_df$Start, end = input_df$End), strand=input_df$Strand)

print("computing coverage")
chip = coverage(chip)
background = coverage(input)

for (n in names(chip)){

  sumc = sum(runLength(chip[n]))
  sumb = sum(runLength(background[n]))

  # to avoid rotation
  if (sumc > sumb){
    print(paste0("Extending bg for chromosome ", n, " with ", sumc - sumb, " nucleotides"))
    background[n] = c(background[n], Rle(0, sumc - sumb))
  } else if (sumb > sumc){
    print(paste0("Extending chip for chromosome ", n, " with ", sumb - sumc, " nucleotides"))
    chip[n] = c(chip[n], Rle(0, sumb - sumc))


print(paste0("length chr1 chip ", sum(runLength(chip["chr1"]))))
print(paste0("length chr1 bg ", sum(runLength(background["chr1"]))))

However, this for-loop does not update the RleList. The final print-statements say

[1] "length chr1 chip 249240672"

[1] "length chr1 bg 249228314"

How can I achieve what I want?


iranges s4vectors rle rlelist • 771 views

Login before adding your answer.

Traffic: 758 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6