Systematic underestimation of log2fc values in DESeq?
4
0
Entering edit mode
@brunosaubamea-13693
Last seen 3.8 years ago

Dear all,

I suspect log2fc values in our DGE study using DESeq2 (DESeq 1.14.1)  to be systematically  understimated (say 2 instead of 2.5, 0 instead of 0.5, -2 instead of -1.5)

I understand that my question is rather general but are there any reasons that could lead DESeq to underestimate fc?

I can give more information if requested.

Many thanks

bruno

 

 

 

 

deseq2 • 1.1k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 17 hours ago
United States

There is a prior on LFC which reduces the estimate when there is low statistical information to support it. The DESeq2 paper focuses on this, so if you want details, please read the DESeq2 citation.

In version 1.16, we first report the un-shrunken LFC (so some LFC will be high simply due to noise in the data), and the shrinkage option is accomplished by a separate function lfcShrink(). So if you used the current version of DESeq2 you would get the larger LFC in the results table.

See this note:

New function lfcShrink() in DESeq2

If you are using an old version of DESeq2, you can use betaPrior=FALSE to get the un-shrunken (and potentially noisy) LFC.

ADD COMMENT
0
Entering edit mode
I'm aware of the shrinkage. If I'm correct it is included by default in DESeq 1.14.1 but the unshrunken LFC can be retrieved by adding addMLE=TRUE in the calling of results(dds). So I compared the shrunken and unshrunken LFC but the problem remains. In fact the shrk LFC distribution has the same center than the unshrk one but its width is smaller. My problem is that I suspect the whole LFC distribution to be shifted towards positive values. If this is the case, my guess is that it could come from normalized count values underestimated in one condition (or overestimatef in the other condition). Could this happen? Are there other (testable) possibilities?
ADD REPLY
0
Entering edit mode

The center of the distribution has to be on zero. There's been a number of recent posts on the support site where I discuss this aspect. Maybe you can find these in recent DESeq2 posts.

Unless you have prior information on which genes are relatively constant (see 'controlGenes' in estimateSizeFactors) there is no other option than to perform computational normalization which essentially centers the distribution on zero.

ADD REPLY
0
Entering edit mode
@brunosaubamea-13693
Last seen 3.8 years ago
I was not aware of this normalization. I will search the forum. In my data mean(lfc) and median(lfc) are about -0.25. Is it coherent with the fact that the distribution is normalized?
ADD COMMENT
0
Entering edit mode
Yes. It's not literally centering the LFC in a post hoc way, but that is roughly a consequence of the first step, size factor estimation. See also the DESeq2 paper or the later section of the vignette explaining the steps. Note that this is not unique to DESeq2, but all gene expression tools need to compute library size factors to remove global shifts.
ADD REPLY
0
Entering edit mode
@brunosaubamea-13693
Last seen 3.8 years ago

OK. The problem might come from the estimated size factors because my sample A is significantly contaminated (might be as high as 50% of total cells) by blood cells while my sample B is highly pure (A and B are 2 distinct but closely related cell types). Thus the size factors might not adequately normalize the counts for the cells of interest in A (am I clear?).

If I could identify all blood cell specific genes, would it be a reasonable solution to remove these gene from the count matrix before running DESeq?

ADD COMMENT
0
Entering edit mode

This could very well be the problem - the easiest way to figure this out is by doing a MA plot (x=mean expression over all samples, y= log2FC between condition) - it should be fairly symmetric - else the normalisation did not work. 

ADD REPLY
0
Entering edit mode
@brunosaubamea-13693
Last seen 3.8 years ago

Below are the MAplots with shrunken and unshrunken LFCs (or for better resolution). I'm not sure whether they look OK...

ADD COMMENT
0
Entering edit mode

sorry, this is the link to the original image
http://imgur.com/E6aseuu

ADD REPLY
0
Entering edit mode

These look "ok" in that the y=0 line look centered, but the null hypothesis of LFC=0 is trivially false here, because the conditions are so extremely different relative to the within-condition variance (see PCA plot as well). I would use lfcThreshold set to something higher to get a more meaningful set of *large* differences (see DESeq2 paper for description).

ADD REPLY
0
Entering edit mode

I'll try this!
 

ADD REPLY

Login before adding your answer.

Traffic: 232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6