Hi-
The easiest way to get the full matrix of counts is using the dba.peakset()
function. This returns a GRanges
object by default, but you can tell it to return a dataframe:
> counts <- dba.peakset(DBA, bRetrieve=T, DataType=DBA_DATA_FRAME)
If you actually want raw read counts, you have to change the default score (which is normalized counts after subtracting the control reads if present). you can change between scoring methods using dba.count()
with peaks=NULL
. For example:
> DBA <- dba.count(DBA, peaks=NULL, score=DBA_SCORE_READS)
> counts <- dba.peakset(DBA, bRetrieve=T, DataType=DBA_DATA_FRAME)
Also, if you just want the counts and not the actual site co-ordinates, you can strip them off:
> counts <- counts[,-(1:3)]
Hope this answers your question.
Cheers-
Rory
Hi Rory. I am using DiffBind for counting reads.
But I get the same count table even I specify score differently:
In this case, the only thing you are changing is the inclusion of Control reads. The most likely explanation is that the Control reads aren't being properly specified.
If you could send me a copy of your object (
H3K27ac_InputSubtract
), I can have a look. You could also post the results of:One thing worth knowing is that you don't need to re-count all the data to change the score. You could perform the second operation very quickly by setting
peaks=NULL
as follows:Cheers-
Rory
I do see some negative numbers after exporting the count table, so the control reads
are subtracted.
You can try emailing me the object and I can try to debug it. I may not be able to debug it exactly as you are using an older version of
DiffBind
that is no longer supported, but I'll see what I can do.-R
Thanks, Let me upgrade DiffbBind and come back to you later.
Hi Rory,
I upgraded DiffBind to the newest version, and I still get the same count table (input subtracted):
H3K27ac_InputSubtract<- dba.count(H3K27ac, minOverlap=3, fragmentSize = 200, bParallel = T, score = "DBA_SCORE_READS_MINUS") ## faster H3K27ac_noInputSubtract<- dba.count(H3K27ac_InputSubtract, peaks=NULL, score = "DBA_SCORE_READS") dba.peakset (H3K27ac_InputSubtract, bRetrieve=TRUE, DataType=DBA_DATA_FRAME, writeFile = "H3K27ac_InputSubtracted_counts.txt") dba.peakset (H3K27ac_noInputSubtract, bRetrieve=TRUE, DataType=DBA_DATA_FRAME, writeFile = "H3K27ac_noInputSubtracted_counts.txt") > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.5 (Santiago) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DiffBind_1.16.0 RSQLite_1.0.0 [3] DBI_0.3.1 locfit_1.5-9.1 [5] GenomicAlignments_1.6.1 Rsamtools_1.22.0 [7] Biostrings_2.38.2 XVector_0.10.0 [9] limma_3.26.3 SummarizedExperiment_1.0.1 [11] Biobase_2.30.0 GenomicRanges_1.22.1 [13] GenomeInfoDb_1.6.1 IRanges_2.4.5 [15] S4Vectors_0.8.4 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.2 lattice_0.20-33 GO.db_3.2.2 [4] gtools_3.5.0 digest_0.6.8 plyr_1.8.3 [7] BatchJobs_1.6 futile.options_1.0.0 ShortRead_1.28.0 [10] ggplot2_1.0.1 gplots_2.17.0 zlibbioc_1.16.0 [13] GenomicFeatures_1.22.6 annotate_1.48.0 gdata_2.17.0 [16] Matrix_1.2-3 checkmate_1.6.3 systemPipeR_1.4.7 [19] proto_0.3-10 GOstats_2.36.0 splines_3.2.2 [22] BiocParallel_1.4.0 stringr_1.0.0 pheatmap_1.0.7 [25] RCurl_1.95-4.7 biomaRt_2.26.1 munsell_0.4.2 [28] sendmailR_1.2-1 rtracklayer_1.30.1 base64enc_0.1-3 [31] BBmisc_1.9 fail_1.3 edgeR_3.12.0 [34] XML_3.98-1.3 AnnotationForge_1.12.1 MASS_7.3-45 [37] bitops_1.0-6 grid_3.2.2 RBGL_1.46.0 [40] xtable_1.8-0 GSEABase_1.32.0 gtable_0.1.2 [43] magrittr_1.5 scales_0.3.0 graph_1.48.0 [46] KernSmooth_2.23-15 amap_0.8-14 stringi_1.0-1 [49] hwriter_1.3.2 reshape2_1.4.1 genefilter_1.52.0 [52] latticeExtra_0.6-26 futile.logger_1.4.1 brew_1.0-6 [55] rjson_0.2.15 lambda.r_1.1.7 RColorBrewer_1.1-2 [58] tools_3.2.2 Category_2.36.0 survival_2.38-3 [61] AnnotationDbi_1.32.2 colorspace_1.2-6 caTools_1.17.1
Could you please just check your own data set to see if you can reproduce it? I am not sure how I can send my object to you. Thanks!
Ming
Hi Rory,
I upgraded DiffBind to the newest version, and I still get the same count table (input subtracted):
H3K27ac_InputSubtract<- dba.count(H3K27ac, minOverlap=3, fragmentSize = 200, bParallel = T, score = "DBA_SCORE_READS_MINUS") ## faster H3K27ac_noInputSubtract<- dba.count(H3K27ac_InputSubtract, peaks=NULL, score = "DBA_SCORE_READS") dba.peakset (H3K27ac_InputSubtract, bRetrieve=TRUE, DataType=DBA_DATA_FRAME, writeFile = "H3K27ac_InputSubtracted_counts.txt") dba.peakset (H3K27ac_noInputSubtract, bRetrieve=TRUE, DataType=DBA_DATA_FRAME, writeFile = "H3K27ac_noInputSubtracted_counts.txt") > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.5 (Santiago) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DiffBind_1.16.0 RSQLite_1.0.0 [3] DBI_0.3.1 locfit_1.5-9.1 [5] GenomicAlignments_1.6.1 Rsamtools_1.22.0 [7] Biostrings_2.38.2 XVector_0.10.0 [9] limma_3.26.3 SummarizedExperiment_1.0.1 [11] Biobase_2.30.0 GenomicRanges_1.22.1 [13] GenomeInfoDb_1.6.1 IRanges_2.4.5 [15] S4Vectors_0.8.4 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.2 lattice_0.20-33 GO.db_3.2.2 [4] gtools_3.5.0 digest_0.6.8 plyr_1.8.3 [7] BatchJobs_1.6 futile.options_1.0.0 ShortRead_1.28.0 [10] ggplot2_1.0.1 gplots_2.17.0 zlibbioc_1.16.0 [13] GenomicFeatures_1.22.6 annotate_1.48.0 gdata_2.17.0 [16] Matrix_1.2-3 checkmate_1.6.3 systemPipeR_1.4.7 [19] proto_0.3-10 GOstats_2.36.0 splines_3.2.2 [22] BiocParallel_1.4.0 stringr_1.0.0 pheatmap_1.0.7 [25] RCurl_1.95-4.7 biomaRt_2.26.1 munsell_0.4.2 [28] sendmailR_1.2-1 rtracklayer_1.30.1 base64enc_0.1-3 [31] BBmisc_1.9 fail_1.3 edgeR_3.12.0 [34] XML_3.98-1.3 AnnotationForge_1.12.1 MASS_7.3-45 [37] bitops_1.0-6 grid_3.2.2 RBGL_1.46.0 [40] xtable_1.8-0 GSEABase_1.32.0 gtable_0.1.2 [43] magrittr_1.5 scales_0.3.0 graph_1.48.0 [46] KernSmooth_2.23-15 amap_0.8-14 stringi_1.0-1 [49] hwriter_1.3.2 reshape2_1.4.1 genefilter_1.52.0 [52] latticeExtra_0.6-26 futile.logger_1.4.1 brew_1.0-6 [55] rjson_0.2.15 lambda.r_1.1.7 RColorBrewer_1.1-2 [58] tools_3.2.2 Category_2.36.0 survival_2.38-3 [61] AnnotationDbi_1.32.2 colorspace_1.2-6 caTools_1.17.1
Could you please just check your own data set to see if you can reproduce it? I am not sure how I can send my object to you. Thanks!
Ming
You didn't show that the counts are still different above, I assume you have checked?
I can not reproduce this with any of my own datasets -- I have tried on a variety of previous and current versions.
Could you send me a link to your DBA object
H3K27ac_InputSubtract
? Easiest way is usually Dropbox or some other link. Send the link to my email (rory.stark@cruk.cam.ac.uk).Cheers-
Rory
Yes, I checked, the tables are the same (counts after subtracting)
I just sent you the object. Thanks very much!
I can't reproduce this with the object you sent (I tried on three different version of DiffBind).
Here is the script I ran:
Can you run the same commands and see if you get the same results?
-R
Thanks Rory! I found out the problem... I specified
score=DBA_SCORE_READS to score="DBA_SCORE_READS"
I guess the quote is the evil...