Can you help me to understand the standard error value (lfcSE) returned by DeSeq2. Is that value also a log2 of the standard error? So for example, a gene with a log2fc of -2.3 and a lfcSE of 0.12, the actual fold change will be 0.2, but what will the equivalent standard error be?
I have a question regarding the application of standard error. To get cumulative (gene-wide) error (noise) per condition I can do following steps.
I can get variance (VAR) of each gene by multiplying standard error (SE) by square root of total number of genes (n) and then squaring it.
So then I can divide the sum of variances by square of n - which would give me cumulative variance in the condition.
cumulative VAR=(VAR01+VAR02+VAR03+..) / n^2
I just want to know if I should go by this way or there is any other alternative?
I dont follow your question, the variance of what?
Using the notation in our paper, do you want the variance of betas (LFC) or K/s (normalized counts)?
sorry for the confusion. I am using "standard error: condition treated vs untreated" (from results columns) for the calculation.
So, I meant individual calculation of variance associated with each LFC.
So then the idea is to take sum of variances and divide it by square of n - which would give me cumulative variance for the condition.
cumulative VAR=(VAR01+VAR02+VAR03+..) / n^2
I still don't follow. When you say "cumulative variance" can you write out what variable you want the variance of, using the DESeq2 notation?
This sounds like you want to see whether one condition is more noisy then the other. If so, the answer is: no, you cannot get anything like a variance estimate per condition from the reported standard errors because DESeq2 does not calculate condition-specific variances.
If you would write what (biological) question you are trying to answer, we could probably help you better.
Yes. I wanted to `see whether one condition is more noisy than the other`.
Biological question I was trying to answer was:
`How much is the noise in my experiment (trt vs untrt)?`
what I have is
1. counts (3 replicates each for trt and untrt) and
2. a DESeq2 result (log2FoldChange, lfcSE etc.) obtained from counts.
I have a mathematical solution but I am not sure if its right.
It's also currently developing on other forum: http://stats.stackexchange.com/questions/272597
DESeq2's variance model assumes that all conditions have the same variance. This is a standard assumption in ANOVA that simplifies the math and substantially improves power (if the assumption is justified).
Hence, your approach will give you the same value for each condition. Any differences will be merely due to sequencing depth, because this is accounted for individually for each sample.
1. I would generally advice against "reversing" a tool's calculation by trying to undo the steps that you think the tool did. If you know the steps, you'd be better of doing them forwards, without the tool (or maybe by borrowing code from it), and if you don't know the inner workings of the tool well enough, trying to go backwards will rarely be correct.
2. In DESeq-1, I had estimated dispersions for each condition separately, but we removed this feature in DESeq2 because it is hard to combine with any design more complicated than a simple one-factor design.
3. The catch is that if you have several treatments, say one noisy and one less noisy, then the contrast of the noisy one versus control will have too many hits (too low p values) and the less noisy one will have too few (too high p values). In that case, we recommend estimating dispersion for each contrast separately.
The answer of the question was indeed in the primary assumption of the tool itself. I should have taken that into account before going ahead.
Remark 1 in particular is a very helpful general advise in this respect.
Thanks a lot. solved!
Is it still true that the dispersions in DESeq2 are not estimated for each condition separately, or has it changed?
If dispersions in DESeq2 are not estimated for each condition separately, how can it be that the dispersions are dependent on the design formula?
Dispersions are estimated one per gene.
Take a look at the Methods of the 2014 DESeq2 paper for details, but you can see that the design matrix X is used to estimate the per-gene dispersion, but nevertheless it is a single parameter per gene (under our model assumptions, and those of other similar software).