Question

Diffbind EdgeR and DESeq2 normalization

0

Entering edit mode

GFM ▴ 20

@gfm-8326

Last seen 3.0 years ago

European Union

Hi,
I ran diffbind and used edgeR and DESeq2.
I have a question regarding the normalized counts. In the DESeq2 analysis, the normalized counts of one of the samples always had integers. In EdgeR - there was no column with integers.
In EdgeR normalization - one column is chosen as reference to all, and in DESeq2 they create another column of the averages that is ussed for normalization.
So In EdgeR I would expect to have after normalization one column with integers, but not in DESeq. I actually get the opposite.
Can you help with this please?

Also, in the publication of the diffbind EdgeR was the default. Now DESeq2 is the default.
Do you have any preference to one of the methods?

Thanks a lot.

diffbind normalized counts • 3.1k views

ADD COMMENT • link 8.5 years ago GFM ▴ 20

score 1 · Answer 1 · 2016-07-14

To answer the second question first, the default analysis method was changed from edgeR to DESeq2 in the current release due to its more conservative normalization. The default TMM normalization in edgeR may over-normalize when a large number of features all change in the same direction, which is more likely in a ChIP-seq than an RNA-seq experiment. When affinity changes occur in both directions, the TMM normalization works well.

Regarding the one-column-of-integers issue, I assume you are looking at the count scores reported from dba.report() with bScores=TRUE? Those values are calculated from within DiffBind using normalization values computed by edgeR and/or DESeq2, as follows.

For edgeR values, the read counts (which may have control reads subtracted, and set to a minimum value of 1) are divided by a normalisation factors derived by multiplying the lib.size by the norm.factors. Only in certain cases will this yield a column of integers.

For DESeq2 values, the reads (adjusted for control reads and minimum value as above) are divided by the result of calling sizeFactors().

Regards-

Rory

score 0 · Answer 2 · 2016-07-14

0

Entering edit mode

GFM ▴ 20

@gfm-8326

Last seen 3.0 years ago

European Union

Thanks a lot for your quick reply and for the great package.

Regarding the normalized counts - I am referring to the counts that are obtained after calling dba.report() with bCounts=TRUE.
I ran DiffBind several times and always for DESeq2 one of the columns has integers.
When using DESeq2 normalization for RNA Seq I never get a column of integers (and I wouldn't expect to get integers, since each count is divided by the size factor).
So how is it I always get a column of integers for DESeq2 in DiffBind?

Thanks

ADD COMMENT • link 8.5 years ago GFM ▴ 20

0

Entering edit mode

OK, I've looked into this a bit more deeply. For the default case when using DESeq2, where bFullLibrarySize=TRUE, DiffBind sets the factors to be the full library sizes (the number of reads in the .bam files) normalized to the smallest library (dividing by the minimum library size). So the smallest library size gets a normalization factor equal to 1, while the others are greater than one. This simple normalization method is only used when bFullLibrarySize=TRUE and method=DBA_DESEQ2. If you set bFullLibrarySize=FALSE, using only the number of reads that overlap consensus peaks, then estimateSizeFactors() is called and the standard mean ratio method is used to calculate the normalization factors, none of which should be equal to 1.

-R

ADD REPLY • link 8.5 years ago Rory Stark ★ 5.2k

score 0 · Answer 3 · 2016-07-14

0

Entering edit mode

GFM ▴ 20

@gfm-8326

Last seen 3.0 years ago

European Union

Thank you very much.
So if I don't expect lots of changes between the samples, it might be better to set bFullLibrarySize=FALSE right?

ADD COMMENT • link 8.5 years ago GFM ▴ 20

score 0 · Answer 4 · 2016-07-14

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 6 days ago

Cambridge, UK

Yes, if there isn't a big imbalance in the signal between sample groups, bFullLibrarySize=FALSE is preferred. This used to be the default, but we changed it to be more conservative as the consequences are worse that way if the assumptions are not met.

-R

ADD COMMENT • link 8.5 years ago Rory Stark ★ 5.2k

score 0 · Answer 5 · 2016-07-14

0

Entering edit mode

GFM ▴ 20

@gfm-8326

Last seen 3.0 years ago

European Union

Thanks a lot!

ADD COMMENT • link 8.5 years ago GFM ▴ 20