Regarding general understanding of DiffBind
2
0
Entering edit mode
@shubhibartaria-10946
Last seen 8.5 years ago

Hello,

This question is from a rookie in terms of wet lab techniques and new student into bioinformatics. I am working in a lab in Johns Hopkins wherein I am given a few BED and BAM files to analyse and see how similar they are. I am using the diffbind package to do so. The first correlation map(heat map), when I run

>>>MB <- dba(sampleSheet="MBdataframe.csv") 

>>>MB

has values corresponding to 0 and below 0 (but not -1)

When I run it using

>>>MB = dba.count(MB)

I get a heatmap with values of 0.9 and 0.955 approx. 

1) What exactly do these values signify in both the maps. How are they different?

2) How are they calculated? Do we use the Pearson Correlation Coeff and this is the p-value? If they are high, do they mean that the samples are highly correlated?

3) Can I use something to integrate the values numerically in the heatmap too so that it can make more sense?

I went through this by doing it in a very simple manner using 3 BED and corresponding BAM files and not complicating it by Conditions and replicates etc for now(I don't have that data as well).

I want to understand this in layman terms. I did go through few of your other questions but again ended up getting confused. Hopefully you understand where I am coming from.

Regards

Shubhi Bartaria

diffbind heatmap • 1.7k views
ADD COMMENT
0
Entering edit mode

Hi,

Can you pass along the output of sessionInfo() please?  The values are correlations, and shouldn't be negative. Higher numbers (darker colours) are higher correlations, i.e. more similar.  (Pearson correlation by default.)

The difference between the two plots is that the first uses the scores in the peak files, while the post-counting heat map uses the actual counts from the BAM files.

There's no capability to incorporate the numbers into the plot, as far as I know (Rory will know for sure).

Cheers,

 - Gord

ADD REPLY
0
Entering edit mode

I hope Rory responds with a way to incorporate exact values into the heatmap.

And my BED files doesn't have a score column. It just has 4 columns

ADD REPLY
0
Entering edit mode
@shubhibartaria-10946
Last seen 8.5 years ago

This is sessionInfo() output

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:

[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] DiffBind_1.16.3            RSQLite_1.0.0             
 [3] DBI_0.4-1                  locfit_1.5-9.1            
 [5] GenomicAlignments_1.6.3    Rsamtools_1.22.0          
 [7] Biostrings_2.38.4          XVector_0.10.0            
 [9] limma_3.26.9               SummarizedExperiment_1.0.2
[11] Biobase_2.30.0             GenomicRanges_1.22.4      
[13] GenomeInfoDb_1.6.3         IRanges_2.4.8             
[15] S4Vectors_0.8.11           BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.5             lattice_0.20-33        
 [3] GO.db_3.2.2             gtools_3.5.0           
 [5] digest_0.6.9            plyr_1.8.3             
 [7] futile.options_1.0.0    BatchJobs_1.6          
 [9] backports_1.0.2         ShortRead_1.28.0       
[11] ggplot2_2.1.0           gplots_3.0.1           
[13] zlibbioc_1.16.0         GenomicFeatures_1.22.13
[15] annotate_1.48.0         gdata_2.17.0           
[17] Matrix_1.2-6            checkmate_1.7.4        
[19] systemPipeR_1.4.8       GOstats_2.36.0         
[21] splines_3.2.2           BiocParallel_1.4.3     
[23] stringr_1.0.0           pheatmap_1.0.8         
[25] RCurl_1.95-4.8          biomaRt_2.26.1         
[27] munsell_0.4.3           sendmailR_1.2-1        
[29] rtracklayer_1.30.4      base64enc_0.1-3        
[31] BBmisc_1.9              fail_1.3               
[33] edgeR_3.12.1            XML_3.98-1.4           
[35] AnnotationForge_1.12.2  bitops_1.0-6           
[37] grid_3.2.2              RBGL_1.46.0            
[39] xtable_1.8-2            GSEABase_1.32.0        
[41] gtable_0.2.0            magrittr_1.5           
[43] scales_0.4.0            graph_1.48.0           
[45] KernSmooth_2.23-15      amap_0.8-14            
[47] stringi_1.1.1           hwriter_1.3.2          
[49] genefilter_1.52.1       latticeExtra_0.6-28    
[51] futile.logger_1.4.1     brew_1.0-6             
[53] rjson_0.2.15            lambda.r_1.1.7         
[55] RColorBrewer_1.1-2      tools_3.2.2            
[57] Category_2.36.0         survival_2.39-4        
[59] AnnotationDbi_1.32.3    colorspace_1.2-6       
[61] caTools_1.17.1          knitr_1.13

I could attach the heatmaps too but don't know how. 

Both the heatmaps use pearson correlation?

ADD COMMENT
0
Entering edit mode

And when you say score in the peak files, does that mean an explicit column in the BED files or it calculates it on its own?

Does the first one only use the BED files and the second uses both BED and BAM files?

ADD REPLY
0
Entering edit mode

Initially (before counting) the scores are from the bed file.  If there are no scores, they will (if I recall correctly) default to 1.  After counting, the bed file(s) are used to form a consensus peak set, but then the score itself is calculated from the reads in the bam files.

ADD REPLY
0
Entering edit mode
@shubhibartaria-10946
Last seen 8.5 years ago

I just faced another dilemma. When I used the dba.count after adding one more bed and bam file which was a combination of all the other 3 files(Mid Brain 1,2,3), I got a plot which was a little different than the one I plotted without the Joint file. Now, if comparison was a pairwise comparison, how did the values change in the heatmap? It was 0.84 between MB2 and MB3 and 0.85 between MB3 and MB1. Without the Joint file, it was 0.9 between MB3 and MB1 and approx 0.89 between MB2 and MB3 (can we not get exact values when plotting heatmaps?)

Could you explain to me how this would have happened.

And I just wanted to inform that my BED files didn't have any score column. It just has chr #, start, end and name. Would that affect the occupancy plot or the DiffBind package calculates it on its own?

I hope I am being clear regarding my queries, because I am very new to this. Anyway I can better articulate my query, do kindly let me know

 

ADD COMMENT
0
Entering edit mode

The numbers could change for a couple of reasons:

1) The normalization can (will) change when you add new data, so the numbers could shift a bit.

2) The addition of the new data will change the consensus peak set, so the calculation of the correlation will change.

I'm not sure what you mean by a "joint" file... you shouldn't be merging bed files and then adding that as a new peak set.  What is it you're really trying to achieve here?

ADD REPLY
0
Entering edit mode

Shubhi, I've addressed some of these issues on another thread:

A: DiffBind affinity vs occupancy Heatmaps

-Rory

ADD REPLY

Login before adding your answer.

Traffic: 755 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6