Hello,
This question is from a rookie in terms of wet lab techniques and new student into bioinformatics. I am working in a lab in Johns Hopkins wherein I am given a few BED and BAM files to analyse and see how similar they are. I am using the diffbind package to do so. The first correlation map(heat map), when I run
>>>MB <- dba(sampleSheet="MBdataframe.csv")
>>>MB
has values corresponding to 0 and below 0 (but not -1)
When I run it using
>>>MB = dba.count(MB)
I get a heatmap with values of 0.9 and 0.955 approx.
1) What exactly do these values signify in both the maps. How are they different?
2) How are they calculated? Do we use the Pearson Correlation Coeff and this is the p-value? If they are high, do they mean that the samples are highly correlated?
3) Can I use something to integrate the values numerically in the heatmap too so that it can make more sense?
I went through this by doing it in a very simple manner using 3 BED and corresponding BAM files and not complicating it by Conditions and replicates etc for now(I don't have that data as well).
I want to understand this in layman terms. I did go through few of your other questions but again ended up getting confused. Hopefully you understand where I am coming from.
Regards
Shubhi Bartaria
Hi,
Can you pass along the output of
sessionInfo()
please? The values are correlations, and shouldn't be negative. Higher numbers (darker colours) are higher correlations, i.e. more similar. (Pearson correlation by default.)The difference between the two plots is that the first uses the scores in the peak files, while the post-counting heat map uses the actual counts from the
BAM
files.There's no capability to incorporate the numbers into the plot, as far as I know (Rory will know for sure).
Cheers,
- Gord
I hope Rory responds with a way to incorporate exact values into the heatmap.
And my BED files doesn't have a score column. It just has 4 columns