Question

window counts in csaw

0

Entering edit mode

nobishvarghese • 0

@nobishvarghese-10577

Last seen 9.1 years ago

All,

I have a question on window counts generated by csaw.Could you share your thought on this?

I have MeDIP data on 3 conditions with 4 replicates each and matching input samples .So, i have 24 samples and it is paired end data.

I have found that windowCounts function in csaw generate differing number of windows when different list of bam files are supplied. For instance,

first.param      <- readParam(max.frag=650, pe="first",dedup=FALSE)
test1_data_first <- windowCounts("SL01_sorted.bam", param=first.param,width=150)
test2_data_first <- windowCounts("SL02_sorted.bam", param=first.param,width=150)
test_data_first  <- windowCounts(c("SL01_sorted.bam","SL02_sorted.bam"), param=first.param,width=150)

gives the below output

> test1_data_first class: RangedSummarizedExperiment dim: 73578 1 metadata(4): spacing width shift final.ext assays(1): counts rownames: NULL rowRanges metadata column names(0): colnames: NULL colData names(4): bam.files totals ext param

> test2_data_first class: RangedSummarizedExperiment dim: 56972 1 metadata(4): spacing width shift final.ext assays(1): counts rownames: NULL rowRanges metadata column names(0): colnames: NULL colData names(4): bam.files totals ext param

> test_data_first class: RangedSummarizedExperiment dim: 464841 2 metadata(4): spacing width shift final.ext assays(1): counts rownames: NULL rowRanges metadata column names(0): colnames: NULL colData names(4): bam.files totals ext param

If this is the case, which would be the ideal way to perform windowCounts? Generating window counts for all samples together or generating window counts for the two groups being compared seperately (group1 and group2, group1 and group3, group2 and group3)?

> sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] csaw_1.4.1                 SummarizedExperiment_1.0.2
[3] Biobase_2.30.0             GenomicRanges_1.22.4
[5] GenomeInfoDb_1.6.3         IRanges_2.4.8
[7] S4Vectors_0.8.11           BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] AnnotationDbi_1.32.3    XVector_0.10.0          edgeR_3.12.0
[4] zlibbioc_1.16.0         GenomicAlignments_1.6.3 BiocParallel_1.4.3
[7] tools_3.2.4             DBI_0.3.1               lambda.r_1.1.7
[10] futile.logger_1.4.1     rtracklayer_1.30.4      futile.options_1.0.0
[13] bitops_1.0-6            biomaRt_2.26.1          RCurl_1.95-4.8
[16] RSQLite_1.0.0           limma_3.26.9            GenomicFeatures_1.22.13
[19] Rsamtools_1.22.0        Biostrings_2.38.4       XML_3.98-1.4
>

csaw counts • 1.7k views

ADD COMMENT • link updated 9.5 years ago by Aaron Lun ★ 29k • written 9.5 years ago by nobishvarghese • 0

score 3 · Accepted Answer · 2016-05-17

If you're going to do differential analyses on all of them together, why not count them all together? This would be the simplest and easiest method by far. You'll end up with a count for each sample in each window, which is what you'll need to perform DB comparisons between samples anyway. If you really want to count them separately, you can set filter=0 in windowCounts. This turns off the filtering that causes you to get different numbers of windows between windowCounts calls, such that you can cbind the objects that you get out. However, this will return counts for every window in the genome, which will chew through your RAM quite quickly.

Separate counting for each group is not recommended. This is for practical reasons, as it's hard to match up windows between RangedSummarizedExperiment objects, especially for DB sites where the window might be missing in a group; and statistical reasons, as you'll end up enriching for windows where the null hypothesis is more likely to be rejected (possibly spuriously), which biases the downstream DB analysis between groups.