Question: window counts in csaw
0
gravatar for nobishvarghese
2.9 years ago by
nobishvarghese0 wrote:

All,

I have a question on window counts generated by csaw.Could you share your thought on this?

I  have MeDIP data on 3 conditions with 4 replicates each and matching input samples .So, i have 24 samples and it is paired end data.

I have found that windowCounts function in csaw generate differing number of windows when different list of bam files are supplied. For instance,

first.param      <- readParam(max.frag=650, pe="first",dedup=FALSE)
test1_data_first <- windowCounts("SL01_sorted.bam", param=first.param,width=150)
test2_data_first <- windowCounts("SL02_sorted.bam", param=first.param,width=150)
test_data_first  <- windowCounts(c("SL01_sorted.bam","SL02_sorted.bam"), param=first.param,width=150)

gives the below output

> test1_data_first
class: RangedSummarizedExperiment
dim: 73578 1
metadata(4): spacing width shift final.ext
assays(1): counts
rownames: NULL
rowRanges metadata column names(0):
colnames: NULL
colData names(4): bam.files totals ext param

> test2_data_first
class: RangedSummarizedExperiment
dim: 56972 1
metadata(4): spacing width shift final.ext
assays(1): counts
rownames: NULL
rowRanges metadata column names(0):
colnames: NULL
colData names(4): bam.files totals ext param


> test_data_first
class: RangedSummarizedExperiment
dim: 464841 2
metadata(4): spacing width shift final.ext
assays(1): counts
rownames: NULL
rowRanges metadata column names(0):
colnames: NULL
colData names(4): bam.files totals ext param

 

If this is the case, which would be the ideal way to perform windowCounts? Generating window counts for all samples together or generating window counts for the two groups being compared seperately (group1 and group2, group1 and group3, group2 and group3)?

 

> sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] csaw_1.4.1                 SummarizedExperiment_1.0.2
[3] Biobase_2.30.0             GenomicRanges_1.22.4      
[5] GenomeInfoDb_1.6.3         IRanges_2.4.8             
[7] S4Vectors_0.8.11           BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.3    XVector_0.10.0          edgeR_3.12.0           
 [4] zlibbioc_1.16.0         GenomicAlignments_1.6.3 BiocParallel_1.4.3     
 [7] tools_3.2.4             DBI_0.3.1               lambda.r_1.1.7         
[10] futile.logger_1.4.1     rtracklayer_1.30.4      futile.options_1.0.0   
[13] bitops_1.0-6            biomaRt_2.26.1          RCurl_1.95-4.8         
[16] RSQLite_1.0.0           limma_3.26.9            GenomicFeatures_1.22.13
[19] Rsamtools_1.22.0        Biostrings_2.38.4       XML_3.98-1.4           
>

 

 

counts csaw • 472 views
ADD COMMENTlink modified 2.9 years ago by Aaron Lun23k • written 2.9 years ago by nobishvarghese0
Answer: window counts in csaw
3
gravatar for Aaron Lun
2.9 years ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

If you're going to do differential analyses on all of them together, why not count them all together? This would be the simplest and easiest method by far. You'll end up with a count for each sample in each window, which is what you'll need to perform DB comparisons between samples anyway. If you really want to count them separately, you can set filter=0 in windowCounts. This turns off the filtering that causes you to get different numbers of windows between windowCounts calls, such that you can cbind the objects that you get out. However, this will return counts for every window in the genome, which will chew through your RAM quite quickly.

Separate counting for each group is not recommended. This is for practical reasons, as it's hard to match up windows between RangedSummarizedExperiment objects, especially for DB sites where the window might be missing in a group; and statistical reasons, as you'll end up enriching for windows where the null hypothesis is more likely to be rejected (possibly spuriously), which biases the downstream DB analysis between groups.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Aaron Lun23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 133 users visited in the last hour