Question: DiffBind: Normalization for DESeq2
0
gravatar for JasonLouisStein
4.5 years ago by
United States
JasonLouisStein0 wrote:

Hi,

I'm trying to understand the normalization for DESeq2 analysis within DiffBind.  If I run:

Pool1 = dba.analyze(Pool1,method=DBA_DESEQ2,bFullLibrarySize=FALSE,bCorPlot=FALSE); 
Pool1.DB = dba.report(Pool1,file="test",method=DBA_DESEQ2,th=1,bCounts=TRUE);  

Then, the normalized counts contained within elementMetadata(Pool1.DB) are calculating by taking the raw counts for each peak divided by the normalization factor s_j calculated via the median of ratios method described in (http://genomebiology.com/2010/11/10/R106).  Is this correct?  When I test this, I get something close: colMeans(originalcounts/outputfromPool1.DB) is highly correlated to s_j but not exactly the same. (Note this may be because I'm using a blocking factor in my model?).

If I run instead, 

Pool1 = dba.analyze(Pool1,method=DBA_DESEQ2,bFullLibrarySize=TRUE,bCorPlot=FALSE);  
Pool1.DB = dba.report(Pool1,file="test",method=DBA_DESEQ2,th=1,bCounts=TRUE);  

Then, the normalized counts contained within elementMetadata(Pool1.DB) are calculated by taking the raw counts for each peak divided by librarysize/min(librarysize).  Is this correct?  I can test this as well, and again I get something close: colMeans(originalcounts/outputfromPool1.DB) is highly correlated to librarysize/min(librarysize) but not exactly the same.

So, by setting bFullLibrarySize=TRUE (the default), then I am only using the library size as a normalization factor and no other normalization factor?  As I understand it, this can be biased by very highly "expressed" peaks, which is why the DESeq2 authors proposed the median normalization method.  Whereas if I set bFullLibrarySize=FALSE, I use the median of ratios method as my normalization factor and not the library size?

That was a lot of questions, but thanks for helping me figure this out, and also thanks for making such a useful and well-supported package!

Jason

diffbind deseq2 • 1.9k views
ADD COMMENTlink modified 4.5 years ago by Rory Stark3.0k • written 4.5 years ago by JasonLouisStein0
Answer: DiffBind: Normalization for DESeq2
0
gravatar for Rory Stark
4.5 years ago by
Rory Stark3.0k
CRUK, Cambridge, UK
Rory Stark3.0k wrote:

Hi Jason-

Section 7.5 of the DiffBind vignette explains how DESeq2 is used.

Specifically, if bFullLibrarySize=FALSE, it calls DESeq2::estimateSizeFactors() to calculate the normalization factors. If bFullLibrarySize=TRUE, it the factors are set to:

> DESeq2::sizeFactors(DESeqDataSeq) <- libsize/min(libsize)

Where libsize is a vector containing the number of reads in each bam file.

The normalized counts returned by dba.report()are the raw reads divided by the normalization factors, obtained by calling DESeq2::sizeFactors().

Hope this helps-

Rory

ADD COMMENTlink written 4.5 years ago by Rory Stark3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 184 users visited in the last hour