I've been trying to use DiffBind for ATAC-seq data. I noticed that there is a large difference in differential accessibility depending on whether bFullLibrarySize is TRUE or FALSE. Specifically, in one case, when set to TRUE, I detect 16,271 differentially accessible peaks, 14,577 of which have a positive fold change; when set to FALSE, for the same contrast, I detect 17,867 differentially accessible peaks, 9,510 of which have a positive fold change. So, when set to TRUE, roughly 90% of the fold changes are unidirectional, whereas when set to FALSE, the fold changes are bidirectional and closer to 50/50. Is there some rational for how one should decide when bFullLibrarySize should be TRUE or FALSE? Since I think this is likely related to FRiP, I'll post the scores below.
ID Condition Replicate Caller Intervals FRiP
A-1 A 1 counts 71007 0.44
A-2 A 2 counts 71007 0.44
A-3 A 3 counts 71007 0.3
A-4 A 4 counts 71007 0.29
A-5 A 5 counts 71007 0.38
B-1 B 1 counts 71007 0.44
B-2 B 2 counts 71007 0.44
B-3 B 3 counts 71007 0.47
B-4 B 4 counts 71007 0.47
B-5 B 5 counts 71007 0.5
B-6 B 6 counts 71007 0.38
Thank you.
Hi Rory, Thanks for posting the explanation above. It is very useful to state what are the assumptions for the
bFullLibrarySize
and what happen when it is set asTRUE
andFALSE
.I'm just wondering about one thing: you mentioned:
Can you please elaborate what information lead you to this conclusion?
On a related issue, I always wonder how people decide which normalization would be appropriate for their experiments especially if they see bias (high density of points) mostly in one direction. Does this mean systematic bias or biological global genome-wide changes?
Thank you!