Search
Question: standard error value (lfcSE) returned by DeSeq2
0
20 months ago by
tootiki0
tootiki0 wrote:

Can you help me to understand the standard error value (lfcSE) returned by DeSeq2. Is that value also a log2 of the standard error?  So for example, a gene with a log2fc of -2.3 and a lfcSE of 0.12, the actual fold change will be 0.2, but what will the equivalent standard error be?

modified 19 months ago by rraadd_80 • written 20 months ago by tootiki0

I have a question regarding the application of standard error. To get cumulative (gene-wide) error (noise) per condition I can do following steps.

I can get variance (VAR) of each gene by multiplying standard error (SE) by square root of total number of genes (n) and then squaring it.

VAR=(SE*sqrt(n))^2

So then I can divide the sum of variances by square of n - which would give me cumulative variance in the condition.

cumulative VAR=(VAR01+VAR02+VAR03+..) / n^2

I just want to know if I should go by this way or there is any other alternative?

Using the notation in our paper, do you want the variance of betas (LFC) or K/s (normalized counts)?

sorry for the confusion. I am using "standard error: condition treated vs untreated" (from results columns) for the calculation.

So, I meant individual calculation of variance associated with each LFC.

So then the idea is to take sum of variances and divide it by square of n - which would give me cumulative variance for the condition.

cumulative VAR=(VAR01+VAR02+VAR03+..) / n^2

I still don't follow. When you say "cumulative variance" can you write out what variable you want the variance of, using the DESeq2 notation?

This sounds like you want to see whether one condition is more noisy then the other. If so, the answer is: no, you cannot get anything like a variance estimate per condition from the reported standard errors because DESeq2 does not calculate condition-specific variances.

If you would write what (biological) question you are trying to answer, we could probably help you better.

Yes. I wanted to see whether one condition is more noisy than the other.
Biological question I was trying to answer was:
How much is the noise in my experiment (trt vs untrt)?

what I have is
1. counts (3 replicates each for trt and untrt) and
2. a DESeq2 result (log2FoldChange, lfcSE etc.) obtained from counts.

I have a mathematical solution but I am not sure if its right.
It's also currently developing on other forum: http://stats.stackexchange.com/questions/272597

1

DESeq2's variance model assumes that all conditions have the same variance. This is a standard assumption in ANOVA that simplifies the math and substantially improves power (if the assumption is justified).

Hence, your approach will give you the same value for each condition. Any differences will be merely due to sequencing depth, because this is accounted for individually for each sample.

Some remarks:

1. I would generally advice against "reversing" a tool's calculation by trying to undo the steps that you think the tool did. If you know the steps, you'd be better of doing them forwards, without the tool (or maybe by borrowing code from it), and if you don't know the inner workings of the tool well enough, trying to go backwards will rarely be correct.

2. In DESeq-1, I had estimated dispersions for each condition separately, but we removed this feature in DESeq2 because it is hard to combine with any design more complicated than a simple one-factor design.

3. The catch is that if you have several treatments, say one noisy and one less noisy, then the contrast of the noisy one versus control will have too many hits (too low p values) and the less noisy one will have too few (too high p values). In that case, we recommend  estimating dispersion for each contrast separately.

The answer of the question was indeed in the primary assumption of the tool itself. I should have taken that into account before going ahead.
Remark 1 in particular is a very helpful general advise in this respect.

Thanks a lot. solved!

2
20 months ago by
Michael Love20k
United States
Michael Love20k wrote:

If you wanted to build standard intervals using the SE, but on the fold change scale (not log fold change), you would add the SE to the estimate in the exponent:

 > 2^-2.3
[1] 0.2030631
!> 2^(-2.3 + c(-1,1) * 0.12)
[1] 0.1868562 0.2206757

That's just +/- 1 SE. To build a wider interval you would multiply the SE.

great!

thanks very much.

But then if you what to write your results in a table with the format 0.20 ± SE, how to you do it?  cause due to the log/exp scale the plus and minus SE will be slightly different.. in this case 0.016 and 0.018.

thank  you