standard error value (lfcSE) returned by DeSeq2
1
0
Entering edit mode
tootiki • 0
@tootiki-12480
Last seen 4.8 years ago

Can you help me to understand the standard error value (lfcSE) returned by DeSeq2. Is that value also a log2 of the standard error?  So for example, a gene with a log2fc of -2.3 and a lfcSE of 0.12, the actual fold change will be 0.2, but what will the equivalent standard error be?

deseq2 deseq lfc lfcse standard error • 8.2k views
0
Entering edit mode

I have a question regarding the application of standard error. To get cumulative (gene-wide) error (noise) per condition I can do following steps.

I can get variance (VAR) of each gene by multiplying standard error (SE) by square root of total number of genes (n) and then squaring it.

VAR=(SE*sqrt(n))^2

So then I can divide the sum of variances by square of n - which would give me cumulative variance in the condition.

cumulative VAR=(VAR01+VAR02+VAR03+..) / n^2

I just want to know if I should go by this way or there is any other alternative?

0
Entering edit mode

Using the notation in our paper, do you want the variance of betas (LFC) or K/s (normalized counts)?

0
Entering edit mode

sorry for the confusion. I am using "standard error: condition treated vs untreated" (from results columns) for the calculation.

So, I meant individual calculation of variance associated with each LFC.

So then the idea is to take sum of variances and divide it by square of n - which would give me cumulative variance for the condition.

cumulative VAR=(VAR01+VAR02+VAR03+..) / n^2

0
Entering edit mode

I still don't follow. When you say "cumulative variance" can you write out what variable you want the variance of, using the DESeq2 notation?

0
Entering edit mode

This sounds like you want to see whether one condition is more noisy then the other. If so, the answer is: no, you cannot get anything like a variance estimate per condition from the reported standard errors because DESeq2 does not calculate condition-specific variances.

If you would write what (biological) question you are trying to answer, we could probably help you better.

0
Entering edit mode

Yes. I wanted to see whether one condition is more noisy than the other.
Biological question I was trying to answer was:
How much is the noise in my experiment (trt vs untrt)?

what I have is
1. counts (3 replicates each for trt and untrt) and
2. a DESeq2 result (log2FoldChange, lfcSE etc.) obtained from counts.

I have a mathematical solution but I am not sure if its right.
It's also currently developing on other forum: http://stats.stackexchange.com/questions/272597

1
Entering edit mode

DESeq2's variance model assumes that all conditions have the same variance. This is a standard assumption in ANOVA that simplifies the math and substantially improves power (if the assumption is justified).

Hence, your approach will give you the same value for each condition. Any differences will be merely due to sequencing depth, because this is accounted for individually for each sample.

Some remarks:

1. I would generally advice against "reversing" a tool's calculation by trying to undo the steps that you think the tool did. If you know the steps, you'd be better of doing them forwards, without the tool (or maybe by borrowing code from it), and if you don't know the inner workings of the tool well enough, trying to go backwards will rarely be correct.

2. In DESeq-1, I had estimated dispersions for each condition separately, but we removed this feature in DESeq2 because it is hard to combine with any design more complicated than a simple one-factor design.

3. The catch is that if you have several treatments, say one noisy and one less noisy, then the contrast of the noisy one versus control will have too many hits (too low p values) and the less noisy one will have too few (too high p values). In that case, we recommend  estimating dispersion for each contrast separately.

0
Entering edit mode

The answer of the question was indeed in the primary assumption of the tool itself. I should have taken that into account before going ahead.
Remark 1 in particular is a very helpful general advise in this respect.

Thanks a lot. solved!

0
Entering edit mode

Is it still true that the dispersions in DESeq2 are not estimated for each condition separately, or has it changed?

If dispersions in DESeq2 are not estimated for each condition separately, how can it be that the dispersions are dependent on the design formula?

Thanks.

0
Entering edit mode

Dispersions are estimated one per gene.

Take a look at the Methods of the 2014 DESeq2 paper for details, but you can see that the design matrix X is used to estimate the per-gene dispersion, but nevertheless it is a single parameter per gene (under our model assumptions, and those of other similar software).

3
Entering edit mode
@mikelove
Last seen 1 day ago
United States

If you wanted to build standard intervals using the SE, but on the fold change scale (not log fold change), you would add the SE to the estimate in the exponent:

 > 2^-2.3
[1] 0.2030631
!> 2^(-2.3 + c(-1,1) * 0.12)
[1] 0.1868562 0.2206757

That's just +/- 1 SE. To build a wider interval you would multiply the SE.

0
Entering edit mode

great!

thanks very much.

0
Entering edit mode

But then if you what to write your results in a table with the format 0.20 ± SE, how to you do it?  cause due to the log/exp scale the plus and minus SE will be slightly different.. in this case 0.016 and 0.018.

thank  you

0
Entering edit mode

The standard error is on the coefficient which is log scale, so you can't really do +/- on the fold change scale.