Question

csaw filterWindows function results in a warning: ("Zero library size detected.")

0

Entering edit mode

chriad ▴ 10

@chriad-10721

Last seen 6.6 years ago

I have a summarizedExperiment with ChIP and input counts and I'd like to use the filterWindows function.

> s641
class: RangedSummarizedExperiment 
dim: 10030 2 
metadata(48): spacing width ... shift final.ext
assays(1): counts
rownames: NULL
rowData names(1): mappableFractionPerRange
colnames(2): 641_H3K9me3_1 641_input_1
colData names(8): bam.files totals ... channel rep

I try to use filterWindows as described in the manual but it results in an error!

filter.stat <- filterWindows(data = s641[,1],
                             background = s641[,2],
                             type='control',
                             prior.count=5,
                             norm.fac = list(s641[,1], s641[,2]))

Error in if (any(lib.size == 0L)) warning("Zero library size detected.") : 
  missing value where TRUE/FALSE needed

> colData(s641)$totals
[1]  9787604 24606244

Can someone help me?

csaw • 1.8k views

ADD COMMENT • link updated 7.9 years ago by Aaron Lun ★ 28k • written 7.9 years ago by chriad ▴ 10

score 1 · Accepted Answer · 2016-05-17

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 19 hours ago

The city by the bay

I assume you're using fairly small windows here (i.e., less than 2 kbp). This is not suitable for normalization against composition biases, because many windows are likely to have zero counts, particularly in the smaller library. I suspect that the error is due to the fact that the computed ChIP-to-control normalization factor is NA, after computing log-ratios for a majority of windows that have zeros in both libraries. This leads to effective library sizes of NA after undoing the log-transformation, which results in the observed error.

If you read the filterWindows documentation, we recommend counting into large bins (>10 kbp) and using those counts as norm.facs. This is more stable as the chance of getting zeroes is reduced for large bins, which should give you more sensible results as the normalization factor is unlikely to be NA.

ADD COMMENT • link 7.9 years ago Aaron Lun ★ 28k

0

Entering edit mode

I use filter=0 in windowCounts, because I had to parallelize and filter!=0 would result in SummarizedExperiment instances with different row counts that couldn't be joined afterwards:

WIDTH=10000
se <- mclapply(X=Rsamtools::path(bfl), FUN=windowCounts, param=param.csaw, bin=T, width=WIDTH, filter=0, mc.cores = 6)
SE <- do.call(cbind, se)

If I understand correctly I should now filter out those bins that have an average(Log)CPM below a certain threshold, to "emulate" the filter argument in windowCounts?

I used the following 'workaround':

s641 <- normalize(s641)
filter.stat <- filterWindows(s641[,1],
                             s641[,2],
                             type='control',
                             prior.count=5,
                             norm.fac = colData(s641)$norm.factors[2]/colData(s641)$norm.factors[1])

which works!?

ADD REPLY • link 7.9 years ago chriad ▴ 10

0

Entering edit mode

I would be surprised if you really "had" to parallelize. Sure, it'll probably provide a speed-up, but you'd probably be saving 5-10 minutes at best (assuming your server's IO is decent). It's not like you're doing this step hundreds of times a day; go make yourself some tea or something while you wait. Nonetheless, I'll probably add a parallelization option in the next release, so you won't have to hack around with filter=0.
If you're using the bins for your DB analysis (assuming H3K9me3 is a fairly broad mark), then yes, you'll have to filter out low-abundance bins based on their average log-CPMs. However, this procedure is not meant to emulate the filter argument in windowCounts. The only purpose of the filter argument is to avoid running out of RAM, by tossing out near-empty windows while reads are being counted from the BAM files. Proper filtering is done using filterWindows to account for different library sizes, log-enrichment over background, etc.
Your "workaround" code is actually another supported way to do it, so it's quite valid. I suspect the reason why it wasn't working before was because you have too many bins in s641 that have zero counts in both libraries. (In fact, this is the only way I can reproduce the error message in your original post.) I don't know how you got from SE to s641, but check how many bins have all-zero counts. These shouldn't be present from a standard windowCounts call.

ADD REPLY • link 7.9 years ago Aaron Lun ★ 28k