Search
Question: Systematic underestimation of log2fc values in DESeq?
0
gravatar for bruno.saubamea
9 days ago by
bruno.saubamea0 wrote:

Dear all,

I suspect log2fc values in our DGE study using DESeq2 (DESeq 1.14.1)  to be systematically  understimated (say 2 instead of 2.5, 0 instead of 0.5, -2 instead of -1.5)

I understand that my question is rather general but are there any reasons that could lead DESeq to underestimate fc?

I can give more information if requested.

Many thanks

bruno

 

 

 

 

ADD COMMENTlink modified 8 days ago • written 9 days ago by bruno.saubamea0
1
gravatar for Michael Love
9 days ago by
Michael Love13k
United States
Michael Love13k wrote:

There is a prior on LFC which reduces the estimate when there is low statistical information to support it. The DESeq2 paper focuses on this, so if you want details, please read the DESeq2 citation.

In version 1.16, we first report the un-shrunken LFC (so some LFC will be high simply due to noise in the data), and the shrinkage option is accomplished by a separate function lfcShrink(). So if you used the current version of DESeq2 you would get the larger LFC in the results table.

See this note:

New function lfcShrink() in DESeq2

If you are using an old version of DESeq2, you can use betaPrior=FALSE to get the un-shrunken (and potentially noisy) LFC.

ADD COMMENTlink modified 9 days ago • written 9 days ago by Michael Love13k
I'm aware of the shrinkage. If I'm correct it is included by default in DESeq 1.14.1 but the unshrunken LFC can be retrieved by adding addMLE=TRUE in the calling of results(dds). So I compared the shrunken and unshrunken LFC but the problem remains. In fact the shrk LFC distribution has the same center than the unshrk one but its width is smaller. My problem is that I suspect the whole LFC distribution to be shifted towards positive values. If this is the case, my guess is that it could come from normalized count values underestimated in one condition (or overestimatef in the other condition). Could this happen? Are there other (testable) possibilities?
ADD REPLYlink written 9 days ago by bruno.saubamea0

The center of the distribution has to be on zero. There's been a number of recent posts on the support site where I discuss this aspect. Maybe you can find these in recent DESeq2 posts.

Unless you have prior information on which genes are relatively constant (see 'controlGenes' in estimateSizeFactors) there is no other option than to perform computational normalization which essentially centers the distribution on zero.

ADD REPLYlink written 9 days ago by Michael Love13k
0
gravatar for bruno.saubamea
9 days ago by
bruno.saubamea0 wrote:
I was not aware of this normalization. I will search the forum. In my data mean(lfc) and median(lfc) are about -0.25. Is it coherent with the fact that the distribution is normalized?
ADD COMMENTlink written 9 days ago by bruno.saubamea0
Yes. It's not literally centering the LFC in a post hoc way, but that is roughly a consequence of the first step, size factor estimation. See also the DESeq2 paper or the later section of the vignette explaining the steps. Note that this is not unique to DESeq2, but all gene expression tools need to compute library size factors to remove global shifts.
ADD REPLYlink written 9 days ago by Michael Love13k
0
gravatar for bruno.saubamea
9 days ago by
bruno.saubamea0 wrote:

OK. The problem might come from the estimated size factors because my sample A is significantly contaminated (might be as high as 50% of total cells) by blood cells while my sample B is highly pure (A and B are 2 distinct but closely related cell types). Thus the size factors might not adequately normalize the counts for the cells of interest in A (am I clear?).

If I could identify all blood cell specific genes, would it be a reasonable solution to remove these gene from the count matrix before running DESeq?

ADD COMMENTlink written 9 days ago by bruno.saubamea0

This could very well be the problem - the easiest way to figure this out is by doing a MA plot (x=mean expression over all samples, y= log2FC between condition) - it should be fairly symmetric - else the normalisation did not work. 

ADD REPLYlink written 8 days ago by kristoffer.vittingseerup20
0
gravatar for bruno.saubamea
8 days ago by
bruno.saubamea0 wrote:

Below are the MAplots with shrunken and unshrunken LFCs (or for better resolution). I'm not sure whether they look OK...

ADD COMMENTlink written 8 days ago by bruno.saubamea0

sorry, this is the link to the original image
http://imgur.com/E6aseuu

ADD REPLYlink written 8 days ago by bruno.saubamea0

These look "ok" in that the y=0 line look centered, but the null hypothesis of LFC=0 is trivially false here, because the conditions are so extremely different relative to the within-condition variance (see PCA plot as well). I would use lfcThreshold set to something higher to get a more meaningful set of *large* differences (see DESeq2 paper for description).

ADD REPLYlink written 7 days ago by Michael Love13k

I'll try this!
 

ADD REPLYlink written 7 days ago by bruno.saubamea0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 276 users visited in the last hour