Hi-
The easiest way to get the full matrix of counts is using the dba.peakset() function. This returns a GRanges object by default, but you can tell it to return a dataframe:
> counts <- dba.peakset(DBA, bRetrieve=T, DataType=DBA_DATA_FRAME)
If you actually want raw read counts, you have to change the default score (which is normalized counts after subtracting the control reads if present). you can change between scoring methods using dba.count() with peaks=NULL. For example:
> DBA <- dba.count(DBA, peaks=NULL, score=DBA_SCORE_READS)
> counts <- dba.peakset(DBA, bRetrieve=T, DataType=DBA_DATA_FRAME)
Also, if you just want the counts and not the actual site co-ordinates, you can strip them off:
> counts <- counts[,-(1:3)]
Hope this answers your question.
Cheers-
Rory
Hi Rory. I am using DiffBind for counting reads.
But I get the same count table even I specify score differently:
library(DiffBind) samples<- read.csv("H3K27ac.csv") H3K27ac<- dba(sampleSheet="H3K27ac.csv") ## on HPC, I want to make it Parallel H3K27ac_InputSubtract<- dba.count(H3K27ac, minOverlap=3, fragmentSize = 200, bParallel = T, score = "DBA_SCORE_READS_MINUS") H3K27ac_noInputSubtract<- dba.count(H3K27ac, minOverlap=3, fragmentSize = 200, bParallel = T, score = "DBA_SCORE_READS") dba.peakset (H3K27ac_InputSubtract, bRetrieve=TRUE, DataType=DBA_DATA_FRAME, writeFile = "H3K27ac_InputSubtracted_counts.txt") dba.peakset (H3K27ac_noInputSubtract, bRetrieve=TRUE, DataType=DBA_DATA_FRAME, writeFile = "H3K27ac_noInputSubtracted_counts.txt") R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DiffBind_1.12.3 GenomicAlignments_1.2.2 Rsamtools_1.18.3 [4] Biostrings_2.34.1 XVector_0.6.0 limma_3.22.7 [7] GenomicRanges_1.18.4 GenomeInfoDb_1.2.5 IRanges_2.0.1 [10] S4Vectors_0.4.0 BiocGenerics_0.12.1 loaded via a namespace (and not attached): [1] amap_0.8-14 base64enc_0.1-3 BatchJobs_1.6 BBmisc_1.9 [5] BiocParallel_1.0.3 bitops_1.0-6 brew_1.0-6 caTools_1.17.1 [9] checkmate_1.6.3 codetools_0.2-14 DBI_0.3.1 digest_0.6.8 [13] edgeR_3.8.6 fail_1.3 foreach_1.4.3 gdata_2.17.0 [17] gplots_2.17.0 grid_3.1.0 gtools_3.5.0 iterators_1.0.8 [21] KernSmooth_2.23-15 lattice_0.20-33 magrittr_1.5 RColorBrewer_1.1-2 [25] RSQLite_1.0.0 sendmailR_1.2-1 stringi_1.0-1 stringr_1.0.0 [29] tools_3.1.0 zlibbioc_1.12.0In this case, the only thing you are changing is the inclusion of Control reads. The most likely explanation is that the Control reads aren't being properly specified.
If you could send me a copy of your object (
H3K27ac_InputSubtract), I can have a look. You could also post the results of:One thing worth knowing is that you don't need to re-count all the data to change the score. You could perform the second operation very quickly by setting
peaks=NULLas follows:Cheers-
Rory
H3K27ac_InputSubtract$peaks[[1]][1:20,] Chr Start End Score RPKM Reads cRPKM cReads 1 chr1 752472 759042 86 1.0705038 220 0.7188937 134 2 chr1 845566 847753 -10 0.6285655 43 0.8541857 53 3 chr1 850918 861812 24 1.0124244 345 1.0385870 321 4 chr1 946879 951106 144 1.5731205 208 0.5336702 64 5 chr1 953846 963471 80 1.0097265 304 0.8202996 224 6 chr1 966517 978024 27 0.9418212 339 0.9556915 312 7 chr1 993603 1000013 -34 0.8179311 164 1.0887606 198 8 chr1 1002011 1005956 -24 0.7698524 95 1.0632250 119 9 chr1 1051185 1054117 44 1.1230631 103 0.7092728 59 10 chr1 1056673 1059141 53 1.4248804 110 0.8140572 57 11 chr1 1062761 1074546 136 0.9711456 358 0.6639703 222 12 chr1 1091609 1097213 157 1.7912756 314 0.9874765 157 13 chr1 1107283 1109601 -2 0.6895845 50 0.7907062 52 14 chr1 1114381 1122110 41 1.0257919 248 0.9440006 207 15 chr1 1133564 1149704 26 0.7823921 395 0.8058386 369 16 chr1 1162858 1177059 84 1.0377982 461 0.9357238 377 17 chr1 1216491 1224276 21 0.9855610 240 0.9915411 219 18 chr1 1225426 1249015 406 1.4352162 1059 0.9757283 653 19 chr1 1254386 1261492 175 1.6196016 360 0.9176388 185 20 chr1 1278700 1285390 143 1.4861586 311 0.8851327 168I do see some negative numbers after exporting the count table, so the control reads
are subtracted.
You can try emailing me the object and I can try to debug it. I may not be able to debug it exactly as you are using an older version of
DiffBindthat is no longer supported, but I'll see what I can do.-R
Thanks, Let me upgrade DiffbBind and come back to you later.
Hi Rory,
I upgraded DiffBind to the newest version, and I still get the same count table (input subtracted):
H3K27ac_InputSubtract<- dba.count(H3K27ac, minOverlap=3, fragmentSize = 200, bParallel = T, score = "DBA_SCORE_READS_MINUS") ## faster H3K27ac_noInputSubtract<- dba.count(H3K27ac_InputSubtract, peaks=NULL, score = "DBA_SCORE_READS") dba.peakset (H3K27ac_InputSubtract, bRetrieve=TRUE, DataType=DBA_DATA_FRAME, writeFile = "H3K27ac_InputSubtracted_counts.txt") dba.peakset (H3K27ac_noInputSubtract, bRetrieve=TRUE, DataType=DBA_DATA_FRAME, writeFile = "H3K27ac_noInputSubtracted_counts.txt") > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.5 (Santiago) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DiffBind_1.16.0 RSQLite_1.0.0 [3] DBI_0.3.1 locfit_1.5-9.1 [5] GenomicAlignments_1.6.1 Rsamtools_1.22.0 [7] Biostrings_2.38.2 XVector_0.10.0 [9] limma_3.26.3 SummarizedExperiment_1.0.1 [11] Biobase_2.30.0 GenomicRanges_1.22.1 [13] GenomeInfoDb_1.6.1 IRanges_2.4.5 [15] S4Vectors_0.8.4 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.2 lattice_0.20-33 GO.db_3.2.2 [4] gtools_3.5.0 digest_0.6.8 plyr_1.8.3 [7] BatchJobs_1.6 futile.options_1.0.0 ShortRead_1.28.0 [10] ggplot2_1.0.1 gplots_2.17.0 zlibbioc_1.16.0 [13] GenomicFeatures_1.22.6 annotate_1.48.0 gdata_2.17.0 [16] Matrix_1.2-3 checkmate_1.6.3 systemPipeR_1.4.7 [19] proto_0.3-10 GOstats_2.36.0 splines_3.2.2 [22] BiocParallel_1.4.0 stringr_1.0.0 pheatmap_1.0.7 [25] RCurl_1.95-4.7 biomaRt_2.26.1 munsell_0.4.2 [28] sendmailR_1.2-1 rtracklayer_1.30.1 base64enc_0.1-3 [31] BBmisc_1.9 fail_1.3 edgeR_3.12.0 [34] XML_3.98-1.3 AnnotationForge_1.12.1 MASS_7.3-45 [37] bitops_1.0-6 grid_3.2.2 RBGL_1.46.0 [40] xtable_1.8-0 GSEABase_1.32.0 gtable_0.1.2 [43] magrittr_1.5 scales_0.3.0 graph_1.48.0 [46] KernSmooth_2.23-15 amap_0.8-14 stringi_1.0-1 [49] hwriter_1.3.2 reshape2_1.4.1 genefilter_1.52.0 [52] latticeExtra_0.6-26 futile.logger_1.4.1 brew_1.0-6 [55] rjson_0.2.15 lambda.r_1.1.7 RColorBrewer_1.1-2 [58] tools_3.2.2 Category_2.36.0 survival_2.38-3 [61] AnnotationDbi_1.32.2 colorspace_1.2-6 caTools_1.17.1Could you please just check your own data set to see if you can reproduce it? I am not sure how I can send my object to you. Thanks!
Ming
Hi Rory,
I upgraded DiffBind to the newest version, and I still get the same count table (input subtracted):
H3K27ac_InputSubtract<- dba.count(H3K27ac, minOverlap=3, fragmentSize = 200, bParallel = T, score = "DBA_SCORE_READS_MINUS") ## faster H3K27ac_noInputSubtract<- dba.count(H3K27ac_InputSubtract, peaks=NULL, score = "DBA_SCORE_READS") dba.peakset (H3K27ac_InputSubtract, bRetrieve=TRUE, DataType=DBA_DATA_FRAME, writeFile = "H3K27ac_InputSubtracted_counts.txt") dba.peakset (H3K27ac_noInputSubtract, bRetrieve=TRUE, DataType=DBA_DATA_FRAME, writeFile = "H3K27ac_noInputSubtracted_counts.txt") > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.5 (Santiago) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DiffBind_1.16.0 RSQLite_1.0.0 [3] DBI_0.3.1 locfit_1.5-9.1 [5] GenomicAlignments_1.6.1 Rsamtools_1.22.0 [7] Biostrings_2.38.2 XVector_0.10.0 [9] limma_3.26.3 SummarizedExperiment_1.0.1 [11] Biobase_2.30.0 GenomicRanges_1.22.1 [13] GenomeInfoDb_1.6.1 IRanges_2.4.5 [15] S4Vectors_0.8.4 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.2 lattice_0.20-33 GO.db_3.2.2 [4] gtools_3.5.0 digest_0.6.8 plyr_1.8.3 [7] BatchJobs_1.6 futile.options_1.0.0 ShortRead_1.28.0 [10] ggplot2_1.0.1 gplots_2.17.0 zlibbioc_1.16.0 [13] GenomicFeatures_1.22.6 annotate_1.48.0 gdata_2.17.0 [16] Matrix_1.2-3 checkmate_1.6.3 systemPipeR_1.4.7 [19] proto_0.3-10 GOstats_2.36.0 splines_3.2.2 [22] BiocParallel_1.4.0 stringr_1.0.0 pheatmap_1.0.7 [25] RCurl_1.95-4.7 biomaRt_2.26.1 munsell_0.4.2 [28] sendmailR_1.2-1 rtracklayer_1.30.1 base64enc_0.1-3 [31] BBmisc_1.9 fail_1.3 edgeR_3.12.0 [34] XML_3.98-1.3 AnnotationForge_1.12.1 MASS_7.3-45 [37] bitops_1.0-6 grid_3.2.2 RBGL_1.46.0 [40] xtable_1.8-0 GSEABase_1.32.0 gtable_0.1.2 [43] magrittr_1.5 scales_0.3.0 graph_1.48.0 [46] KernSmooth_2.23-15 amap_0.8-14 stringi_1.0-1 [49] hwriter_1.3.2 reshape2_1.4.1 genefilter_1.52.0 [52] latticeExtra_0.6-26 futile.logger_1.4.1 brew_1.0-6 [55] rjson_0.2.15 lambda.r_1.1.7 RColorBrewer_1.1-2 [58] tools_3.2.2 Category_2.36.0 survival_2.38-3 [61] AnnotationDbi_1.32.2 colorspace_1.2-6 caTools_1.17.1Could you please just check your own data set to see if you can reproduce it? I am not sure how I can send my object to you. Thanks!
Ming
You didn't show that the counts are still different above, I assume you have checked?
I can not reproduce this with any of my own datasets -- I have tried on a variety of previous and current versions.
Could you send me a link to your DBA object
H3K27ac_InputSubtract? Easiest way is usually Dropbox or some other link. Send the link to my email (rory.stark@cruk.cam.ac.uk).Cheers-
Rory
Yes, I checked, the tables are the same (counts after subtracting)
I just sent you the object. Thanks very much!
I can't reproduce this with the object you sent (I tried on three different version of DiffBind).
Here is the script I ran:
> library(DiffBind) > load('H3K27ac_InputSubtract.rda') > sessionInfo()$otherPkgs$DiffBind$Version [1] "1.16.0" > reads_minus <- dba.peakset(H3K27ac_InputSubtract,bRetrieve=T, writeFile="H3K27ac_InputSubtract.txt", DataType=DBA_DATA_FRAME) > reads_minus[1:10,1:6] CHR START END TCGA-08-0352 TCGA-08-0359 TCGA-27-1833 1 chr1 752472 759042 86 129 16 2 chr1 845566 847753 -10 37 16 3 chr1 850918 861812 24 159 74 4 chr1 946879 951106 144 209 46 5 chr1 953846 963471 80 305 18 6 chr1 966517 978024 27 236 23 7 chr1 993603 1000013 -34 82 22 8 chr1 1002011 1005956 -24 61 12 9 chr1 1051185 1054117 44 41 16 10 chr1 1056673 1059141 53 105 15 > H3K27ac_noInputSubtract <- dba.count(H3K27ac_InputSubtract, peaks=NULL, score=DBA_SCORE_READS) > reads_only <- dba.peakset(H3K27ac_noInputSubtract,bRetrieve=T, writeFile="H3K27ac_noInputSubtract.txt", DataType=DBA_DATA_FRAME) > reads_only[1:10,1:6] CHR START END TCGA-08-0352 TCGA-08-0359 TCGA-27-1833 1 chr1 752472 759042 220 224 41 2 chr1 845566 847753 43 90 29 3 chr1 850918 861812 345 491 145 4 chr1 946879 951106 208 302 58 5 chr1 953846 963471 304 512 75 6 chr1 966517 978024 339 555 113 7 chr1 993603 1000013 164 223 63 8 chr1 1002011 1005956 95 171 49 9 chr1 1051185 1054117 103 114 31 10 chr1 1056673 1059141 110 176 29 > reads.minus <- read.table("H3K27ac_InputSubtract.txt") > reads_minus[1:10,1:6] CHR START END TCGA-08-0352 TCGA-08-0359 TCGA-27-1833 1 chr1 752472 759042 86 129 16 2 chr1 845566 847753 -10 37 16 3 chr1 850918 861812 24 159 74 4 chr1 946879 951106 144 209 46 5 chr1 953846 963471 80 305 18 6 chr1 966517 978024 27 236 23 7 chr1 993603 1000013 -34 82 22 8 chr1 1002011 1005956 -24 61 12 9 chr1 1051185 1054117 44 41 16 10 chr1 1056673 1059141 53 105 15 > reads.only <- read.table("H3K27ac_noInputSubtract.txt") > reads_only[1:10,1:6] CHR START END TCGA-08-0352 TCGA-08-0359 TCGA-27-1833 1 chr1 752472 759042 220 224 41 2 chr1 845566 847753 43 90 29 3 chr1 850918 861812 345 491 145 4 chr1 946879 951106 208 302 58 5 chr1 953846 963471 304 512 75 6 chr1 966517 978024 339 555 113 7 chr1 993603 1000013 164 223 63 8 chr1 1002011 1005956 95 171 49 9 chr1 1051185 1054117 103 114 31 10 chr1 1056673 1059141 110 176 29Can you run the same commands and see if you get the same results?
-R
Thanks Rory! I found out the problem... I specified
score=DBA_SCORE_READS to score="DBA_SCORE_READS"I guess the quote is the evil...