Question

Systematic underestimation of log2fc values in DESeq?

0

Entering edit mode

bruno.saubamea • 0

@brunosaubamea-13693

Last seen 6.6 years ago

Dear all,

I suspect log2fc values in our DGE study using DESeq2 (DESeq 1.14.1) to be systematically understimated (say 2 instead of 2.5, 0 instead of 0.5, -2 instead of -1.5)

I understand that my question is rather general but are there any reasons that could lead DESeq to underestimate fc?

I can give more information if requested.

Many thanks

bruno

deseq2 • 1.9k views

ADD COMMENT • link 6.7 years ago bruno.saubamea • 0

score 1 · Answer 1 · 2017-08-13

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 9 hours ago

United States

There is a prior on LFC which reduces the estimate when there is low statistical information to support it. The DESeq2 paper focuses on this, so if you want details, please read the DESeq2 citation.

In version 1.16, we first report the un-shrunken LFC (so some LFC will be high simply due to noise in the data), and the shrinkage option is accomplished by a separate function lfcShrink(). So if you used the current version of DESeq2 you would get the larger LFC in the results table.

See this note:

New function lfcShrink() in DESeq2

If you are using an old version of DESeq2, you can use betaPrior=FALSE to get the un-shrunken (and potentially noisy) LFC.

ADD COMMENT • link 6.7 years ago Michael Love 41k

0

Entering edit mode

I'm aware of the shrinkage. If I'm correct it is included by default in DESeq 1.14.1 but the unshrunken LFC can be retrieved by adding addMLE=TRUE in the calling of results(dds). So I compared the shrunken and unshrunken LFC but the problem remains. In fact the shrk LFC distribution has the same center than the unshrk one but its width is smaller. My problem is that I suspect the whole LFC distribution to be shifted towards positive values. If this is the case, my guess is that it could come from normalized count values underestimated in one condition (or overestimatef in the other condition). Could this happen? Are there other (testable) possibilities?

ADD REPLY • link 6.7 years ago bruno.saubamea • 0

0

Entering edit mode

The center of the distribution has to be on zero. There's been a number of recent posts on the support site where I discuss this aspect. Maybe you can find these in recent DESeq2 posts.

Unless you have prior information on which genes are relatively constant (see 'controlGenes' in estimateSizeFactors) there is no other option than to perform computational normalization which essentially centers the distribution on zero.

ADD REPLY • link 6.7 years ago Michael Love 41k

score 0 · Answer 2 · 2017-08-13

0

Entering edit mode

bruno.saubamea • 0

@brunosaubamea-13693

Last seen 6.6 years ago

I was not aware of this normalization. I will search the forum. In my data mean(lfc) and median(lfc) are about -0.25. Is it coherent with the fact that the distribution is normalized?

ADD COMMENT • link 6.7 years ago bruno.saubamea • 0

0

Entering edit mode

Yes. It's not literally centering the LFC in a post hoc way, but that is roughly a consequence of the first step, size factor estimation. See also the DESeq2 paper or the later section of the vignette explaining the steps. Note that this is not unique to DESeq2, but all gene expression tools need to compute library size factors to remove global shifts.

ADD REPLY • link 6.7 years ago Michael Love 41k

score 0 · Answer 3 · 2017-08-13

0

Entering edit mode

bruno.saubamea • 0

@brunosaubamea-13693

Last seen 6.6 years ago

OK. The problem might come from the estimated size factors because my sample A is significantly contaminated (might be as high as 50% of total cells) by blood cells while my sample B is highly pure (A and B are 2 distinct but closely related cell types). Thus the size factors might not adequately normalize the counts for the cells of interest in A (am I clear?).

If I could identify all blood cell specific genes, would it be a reasonable solution to remove these gene from the count matrix before running DESeq?

ADD COMMENT • link 6.7 years ago bruno.saubamea • 0

0

Entering edit mode

This could very well be the problem - the easiest way to figure this out is by doing a MA plot (x=mean expression over all samples, y= log2FC between condition) - it should be fairly symmetric - else the normalisation did not work.

ADD REPLY • link 6.7 years ago kristoffer.vittingseerup ▴ 20

score 0 · Answer 4 · 2017-08-14

0

Entering edit mode

bruno.saubamea • 0

@brunosaubamea-13693

Last seen 6.6 years ago

Below are the MAplots with shrunken and unshrunken LFCs (or for better resolution). I'm not sure whether they look OK...

ADD COMMENT • link 6.7 years ago bruno.saubamea • 0

0

Entering edit mode

sorry, this is the link to the original image
http://imgur.com/E6aseuu

ADD REPLY • link 6.7 years ago bruno.saubamea • 0

0

Entering edit mode

These look "ok" in that the y=0 line look centered, but the null hypothesis of LFC=0 is trivially false here, because the conditions are so extremely different relative to the within-condition variance (see PCA plot as well). I would use lfcThreshold set to something higher to get a more meaningful set of *large* differences (see DESeq2 paper for description).