Question

Regarding general understanding of DiffBind

0

Entering edit mode

shubhibartaria ▴ 10

@shubhibartaria-10946

Last seen 7.8 years ago

Hello,

This question is from a rookie in terms of wet lab techniques and new student into bioinformatics. I am working in a lab in Johns Hopkins wherein I am given a few BED and BAM files to analyse and see how similar they are. I am using the diffbind package to do so. The first correlation map(heat map), when I run

>>>MB <- dba(sampleSheet="MBdataframe.csv")

>>>MB

has values corresponding to 0 and below 0 (but not -1)

When I run it using

>>>MB = dba.count(MB)

I get a heatmap with values of 0.9 and 0.955 approx.

1) What exactly do these values signify in both the maps. How are they different?

2) How are they calculated? Do we use the Pearson Correlation Coeff and this is the p-value? If they are high, do they mean that the samples are highly correlated?

3) Can I use something to integrate the values numerically in the heatmap too so that it can make more sense?

I went through this by doing it in a very simple manner using 3 BED and corresponding BAM files and not complicating it by Conditions and replicates etc for now(I don't have that data as well).

I want to understand this in layman terms. I did go through few of your other questions but again ended up getting confused. Hopefully you understand where I am coming from.

Regards

Shubhi Bartaria

diffbind heatmap • 1.4k views

ADD COMMENT • link 7.8 years ago shubhibartaria ▴ 10

0

Entering edit mode

Hi,

Can you pass along the output of sessionInfo() please? The values are correlations, and shouldn't be negative. Higher numbers (darker colours) are higher correlations, i.e. more similar. (Pearson correlation by default.)

The difference between the two plots is that the first uses the scores in the peak files, while the post-counting heat map uses the actual counts from the BAM files.

There's no capability to incorporate the numbers into the plot, as far as I know (Rory will know for sure).

Cheers,

- Gord

ADD REPLY • link 7.8 years ago Gord Brown ▴ 650

0

Entering edit mode

I hope Rory responds with a way to incorporate exact values into the heatmap.

And my BED files doesn't have a score column. It just has 4 columns

ADD REPLY • link 7.8 years ago shubhibartaria ▴ 10

score 0 · Answer 1 · 2016-06-21

This is sessionInfo() output

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:

[1] C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] DiffBind_1.16.3 RSQLite_1.0.0
[3] DBI_0.4-1 locfit_1.5-9.1
[5] GenomicAlignments_1.6.3 Rsamtools_1.22.0
[7] Biostrings_2.38.4 XVector_0.10.0
[9] limma_3.26.9 SummarizedExperiment_1.0.2
[11] Biobase_2.30.0 GenomicRanges_1.22.4
[13] GenomeInfoDb_1.6.3 IRanges_2.4.8
[15] S4Vectors_0.8.11 BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] Rcpp_0.12.5 lattice_0.20-33
[3] GO.db_3.2.2 gtools_3.5.0
[5] digest_0.6.9 plyr_1.8.3
[7] futile.options_1.0.0 BatchJobs_1.6
[9] backports_1.0.2 ShortRead_1.28.0
[11] ggplot2_2.1.0 gplots_3.0.1
[13] zlibbioc_1.16.0 GenomicFeatures_1.22.13
[15] annotate_1.48.0 gdata_2.17.0
[17] Matrix_1.2-6 checkmate_1.7.4
[19] systemPipeR_1.4.8 GOstats_2.36.0
[21] splines_3.2.2 BiocParallel_1.4.3
[23] stringr_1.0.0 pheatmap_1.0.8
[25] RCurl_1.95-4.8 biomaRt_2.26.1
[27] munsell_0.4.3 sendmailR_1.2-1
[29] rtracklayer_1.30.4 base64enc_0.1-3
[31] BBmisc_1.9 fail_1.3
[33] edgeR_3.12.1 XML_3.98-1.4
[35] AnnotationForge_1.12.2 bitops_1.0-6
[37] grid_3.2.2 RBGL_1.46.0
[39] xtable_1.8-2 GSEABase_1.32.0
[41] gtable_0.2.0 magrittr_1.5
[43] scales_0.4.0 graph_1.48.0
[45] KernSmooth_2.23-15 amap_0.8-14
[47] stringi_1.1.1 hwriter_1.3.2
[49] genefilter_1.52.1 latticeExtra_0.6-28
[51] futile.logger_1.4.1 brew_1.0-6
[53] rjson_0.2.15 lambda.r_1.1.7
[55] RColorBrewer_1.1-2 tools_3.2.2
[57] Category_2.36.0 survival_2.39-4
[59] AnnotationDbi_1.32.3 colorspace_1.2-6
[61] caTools_1.17.1 knitr_1.13

I could attach the heatmaps too but don't know how.

Both the heatmaps use pearson correlation?

score 0 · Answer 2 · 2016-06-23

0

Entering edit mode

shubhibartaria ▴ 10

@shubhibartaria-10946

Last seen 7.8 years ago

I just faced another dilemma. When I used the dba.count after adding one more bed and bam file which was a combination of all the other 3 files(Mid Brain 1,2,3), I got a plot which was a little different than the one I plotted without the Joint file. Now, if comparison was a pairwise comparison, how did the values change in the heatmap? It was 0.84 between MB2 and MB3 and 0.85 between MB3 and MB1. Without the Joint file, it was 0.9 between MB3 and MB1 and approx 0.89 between MB2 and MB3 (can we not get exact values when plotting heatmaps?)

Could you explain to me how this would have happened.

And I just wanted to inform that my BED files didn't have any score column. It just has chr #, start, end and name. Would that affect the occupancy plot or the DiffBind package calculates it on its own?

I hope I am being clear regarding my queries, because I am very new to this. Anyway I can better articulate my query, do kindly let me know

ADD COMMENT • link 7.8 years ago shubhibartaria ▴ 10

0

Entering edit mode

The numbers could change for a couple of reasons:

1) The normalization can (will) change when you add new data, so the numbers could shift a bit.

2) The addition of the new data will change the consensus peak set, so the calculation of the correlation will change.

I'm not sure what you mean by a "joint" file... you shouldn't be merging bed files and then adding that as a new peak set. What is it you're really trying to achieve here?

ADD REPLY • link 7.8 years ago Gord Brown ▴ 650

0

Entering edit mode

Shubhi, I've addressed some of these issues on another thread:

A: DiffBind affinity vs occupancy Heatmaps

-Rory

ADD REPLY • link 7.8 years ago Rory Stark ★ 5.1k