window counts in csaw
1
0
Entering edit mode
@nobishvarghese-10577
Last seen 8.0 years ago

All,

I have a question on window counts generated by csaw.Could you share your thought on this?

I  have MeDIP data on 3 conditions with 4 replicates each and matching input samples .So, i have 24 samples and it is paired end data.

I have found that windowCounts function in csaw generate differing number of windows when different list of bam files are supplied. For instance,

first.param      <- readParam(max.frag=650, pe="first",dedup=FALSE)
test1_data_first <- windowCounts("SL01_sorted.bam", param=first.param,width=150)
test2_data_first <- windowCounts("SL02_sorted.bam", param=first.param,width=150)
test_data_first  <- windowCounts(c("SL01_sorted.bam","SL02_sorted.bam"), param=first.param,width=150)

gives the below output

> test1_data_first
class: RangedSummarizedExperiment
dim: 73578 1
metadata(4): spacing width shift final.ext
assays(1): counts
rownames: NULL
rowRanges metadata column names(0):
colnames: NULL
colData names(4): bam.files totals ext param

> test2_data_first
class: RangedSummarizedExperiment
dim: 56972 1
metadata(4): spacing width shift final.ext
assays(1): counts
rownames: NULL
rowRanges metadata column names(0):
colnames: NULL
colData names(4): bam.files totals ext param


> test_data_first
class: RangedSummarizedExperiment
dim: 464841 2
metadata(4): spacing width shift final.ext
assays(1): counts
rownames: NULL
rowRanges metadata column names(0):
colnames: NULL
colData names(4): bam.files totals ext param

 

If this is the case, which would be the ideal way to perform windowCounts? Generating window counts for all samples together or generating window counts for the two groups being compared seperately (group1 and group2, group1 and group3, group2 and group3)?

 

> sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] csaw_1.4.1                 SummarizedExperiment_1.0.2
[3] Biobase_2.30.0             GenomicRanges_1.22.4      
[5] GenomeInfoDb_1.6.3         IRanges_2.4.8             
[7] S4Vectors_0.8.11           BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.3    XVector_0.10.0          edgeR_3.12.0           
 [4] zlibbioc_1.16.0         GenomicAlignments_1.6.3 BiocParallel_1.4.3     
 [7] tools_3.2.4             DBI_0.3.1               lambda.r_1.1.7         
[10] futile.logger_1.4.1     rtracklayer_1.30.4      futile.options_1.0.0   
[13] bitops_1.0-6            biomaRt_2.26.1          RCurl_1.95-4.8         
[16] RSQLite_1.0.0           limma_3.26.9            GenomicFeatures_1.22.13
[19] Rsamtools_1.22.0        Biostrings_2.38.4       XML_3.98-1.4           
>

 

 

csaw counts • 1.5k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 16 hours ago
The city by the bay

If you're going to do differential analyses on all of them together, why not count them all together? This would be the simplest and easiest method by far. You'll end up with a count for each sample in each window, which is what you'll need to perform DB comparisons between samples anyway. If you really want to count them separately, you can set filter=0 in windowCounts. This turns off the filtering that causes you to get different numbers of windows between windowCounts calls, such that you can cbind the objects that you get out. However, this will return counts for every window in the genome, which will chew through your RAM quite quickly.

Separate counting for each group is not recommended. This is for practical reasons, as it's hard to match up windows between RangedSummarizedExperiment objects, especially for DB sites where the window might be missing in a group; and statistical reasons, as you'll end up enriching for windows where the null hypothesis is more likely to be rejected (possibly spuriously), which biases the downstream DB analysis between groups.

ADD COMMENT

Login before adding your answer.

Traffic: 630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6