Question: Is there a way to run Diffbind without the normalization ?
1
15 months ago by
Thomas10
Thomas10 wrote:

Hello,

I was wondering if there was an option to run the differential analysis without the normalisation step ?

Apparently there is an option like that for some plots but I can't happen to find anything for the differential analysis.

Thomas

modified 13 months ago by Rory Stark2.9k • written 15 months ago by Thomas10
Answer: Is there a way to run Diffbind without the normalization ?
0
13 months ago by
Rory Stark2.9k
CRUK, Cambridge, UK
Rory Stark2.9k wrote:

I am currently working on giving users much more control over normalization in DiffBind. In the meantime, there is a trick you can do you avoid normalization when using DESeq2 as the analysis method.

After running dba.count(), you can change the individual library sizes to all be the same. Then if you run dba.analyze() with bFullLibrarySize=TRUE (default) the normalization facts will all be equal to 1. You probably want to run dba.analyze()with bSubControl=FALSE to avoid having the control reds subtracted if do not want normalization of any kind.

You can set the library size values as follows:

> myDBA <- dba.count(myDBA, ...)
> myDBA\$class[8,] <- 1e+07
> myDBA <- dba.analyze(myDBA, method=DBA_DESEQ2, bFullLibrarySize=TRUE, bSubControl=FALSE)

Also, there is a "backdoor" in DiffBind to allow you to provide your own normalized read counts -- if that is of interest, I can show the code required to use it.

In my opinion, for visualization purposes it's accurate enough to convert the raw reads (bam files) to tdf format and load them in IGV with the option "normalize coverage data" enabled, this will rescale the raw counts to 1000000 / total counts.thus correcting for different library sizes. This assuming you are using/willing to use IGV, other browsers surely have similar options.

DiffBind, or rather the underlying edgeR and DESeq packages, don't make use of the read profiles. All they use are the raw counts of reads in given intervals (ChipSeq peaks, genes, whatever), in fact the read profile within an interval doesn't enter in the analysis at all.

If you want to show the log fold change you could prepare a bedgraph file where each line has as coordinates the region tested, or maybe the midpoint of the region tested, whichever looks better, and as value (4th column) the log fold change. Then load this file in IGV with or without the raw reads file as above.