Hi,
I have two questions about how DiffBind works (listed below). Could you hep me figure these out. Thank you for your help in advance!
Best, Duygu Ucar.
- First, why do the correlation plots generated from the same data (no contrast) look very different. For example when I run the below two commands I generated two correlation plots and they are substantially different, first one is in fact what we expect to see. I tried to understand what is causing this difference but I could not figure it out from the description of functions. Could you help me with this ?
- pbmc.subset <- dba(sampleSheet="ND5_diffbind_v2.csv")
- pbmc.subset <- dba.count(pbmc.subset,bParallel=TRUE)
- In the case of no control samples, which normalization/summary method do you recommend? Are these significantly different ? Do you have any guidelines/insights to follow for different types of datasets?
DBA_SCORE_READS |
raw read count for interval using only reads from ChIP |
DBA_SCORE_READS_FOLD |
raw read count for interval from ChIP divided by read count for interval from control |
DBA_SCORE_READS_MINUS |
raw read count for interval from ChIP minus read count for interval from control |
DBA_SCORE_RPKM |
RPKM for interval using only reads from ChIP |
DBA_SCORE_RPKM_FOLD |
RPKM for interval from ChIP divided by RPKM for interval from control |
DBA_SCORE_TMM_READS_FULL |
TMM normalized (using edgeR), using ChIP read counts and Full Library size |
DBA_SCORE_TMM_READS_EFFECTIVE |
TMM normalized (using edgeR), using ChIP read counts and Effective Library size |
DBA_SCORE_TMM_MINUS_FULL |
TMM normalized (using edgeR), using ChIP read counts minus Control read counts and Full Library size |
DBA_SCORE_TMM_MINUS_EFFECTIVE |
TMM normalized (using edgeR), using ChIP read counts minus Control read counts and Effective Library size |
DBA_SCORE_TMM_READS_FULL_CPM |
same as DBA_SCORE_TMM_READS_FULL, but reporrted in counts-per-million. |
DBA_SCORE_TMM_READS_EFFECTIVE_CPM |
same as DBA_SCORE_TMM_READS_EFFECTIVE, but reporrted in counts-per-million. |
DBA_SCORE_TMM_MINUS_FULL_CPM |
same as DBA_SCORE_TMM_MINUS_FULL, but reporrted in counts-per-million. |
DBA_SCORE_TMM_MINUS_EFFECTIVE_CPM |
Tsame as DBA_SCORE_TMM_MINUS_EFFECTIVE, but reporrted in counts-per-million. |
DBA_SCORE_SUMMIT |
summit height (maximum read pileup value) |
DBA_SCORE_SUMMIT_ADJ |
summit height (maximum read pileup value), normalized to relative library size |
DBA_SCORE_SUMMIT_POS |
summit position (location of maximum read pileup) |
Hello Duygu-
I'm not clear on exactly what the question is, perhaps you could supply a code snippet? In the original question there hadn't been any differential analysis done (no call to
dba.analyze()
), so I'm not sure what significance scores you are referring to. In the first instance, "missing" peaks for a sample get a score of -1. If you have very different peak calls, with only a few peaks called for some samples, then they would have a lot of -1 values and could indeed correlate closely. If it turned out that the actual read counts in these regions were not that different than other samples, they could then be more highly correlated (and hence closer) to different samples.-R