DiffBind: Normalization for DESeq2
1
0
Entering edit mode
@jasonlouisstein-7953
Last seen 6.1 years ago
United States

Hi,

I'm trying to understand the normalization for DESeq2 analysis within DiffBind.  If I run:

Pool1 = dba.analyze(Pool1,method=DBA_DESEQ2,bFullLibrarySize=FALSE,bCorPlot=FALSE);  Pool1.DB = dba.report(Pool1,file="test",method=DBA_DESEQ2,th=1,bCounts=TRUE);  

Then, the normalized counts contained within elementMetadata(Pool1.DB) are calculating by taking the raw counts for each peak divided by the normalization factor s_j calculated via the median of ratios method described in (http://genomebiology.com/2010/11/10/R106).  Is this correct?  When I test this, I get something close: colMeans(originalcounts/outputfromPool1.DB) is highly correlated to s_j but not exactly the same. (Note this may be because I'm using a blocking factor in my model?).

Pool1 = dba.analyze(Pool1,method=DBA_DESEQ2,bFullLibrarySize=TRUE,bCorPlot=FALSE);
Pool1.DB = dba.report(Pool1,file="test",method=DBA_DESEQ2,th=1,bCounts=TRUE);  

Then, the normalized counts contained within elementMetadata(Pool1.DB) are calculated by taking the raw counts for each peak divided by librarysize/min(librarysize).  Is this correct?  I can test this as well, and again I get something close: colMeans(originalcounts/outputfromPool1.DB) is highly correlated to librarysize/min(librarysize) but not exactly the same.

So, by setting bFullLibrarySize=TRUE (the default), then I am only using the library size as a normalization factor and no other normalization factor?  As I understand it, this can be biased by very highly "expressed" peaks, which is why the DESeq2 authors proposed the median normalization method.  Whereas if I set bFullLibrarySize=FALSE, I use the median of ratios method as my normalization factor and not the library size?

That was a lot of questions, but thanks for helping me figure this out, and also thanks for making such a useful and well-supported package!

Jason

diffbind deseq2 • 2.1k views
0
Entering edit mode
Rory Stark ★ 4.1k
@rory-stark-5741
Last seen 24 days ago
CRUK, Cambridge, UK

Hi Jason-

Section 7.5 of the DiffBind vignette explains how DESeq2 is used.

Specifically, if bFullLibrarySize=FALSE, it calls DESeq2::estimateSizeFactors() to calculate the normalization factors. If bFullLibrarySize=TRUE, it the factors are set to:

> DESeq2::sizeFactors(DESeqDataSeq) <- libsize/min(libsize)

Where libsize is a vector containing the number of reads in each bam file.

The normalized counts returned by dba.report()are the raw reads divided by the normalization factors, obtained by calling DESeq2::sizeFactors().

Hope this helps-

Rory