How do I calculate the error associated with aggregated L2FC values?
I'd like to combine multiple differential expression values into a single statistic, with variance, to express how genes in the mitochondrial complexes change (or don't change) compared to wildtype cell lines under certain conditions. Here's an example of a graph I'd like to reduce to a simpler line-with-error plot (possibly with overlaid values). This is a beeswarm plot of the [non-shrunk] L2FC values produced by DESeq2:
Here's what it looks like when I do a normal boxplot of those results in R:
But that's not right, because the points don't represent individual data values; they're similar to a difference of means. I want something that looks similar (or maybe a "thick line with indicated error" plot), but where the error represents the standard error associated with the mean L2FC. How should I calculate this error?
I notice that DESeq2 reports "lfcSE", and am wondering if I can calculate a pooled variance based on that. I had a look at the Pooled variance formula on Wikipedia, and my rough scratchings (with a bit of cancelling) suggest that in a situation with k L2FC values testing equal-size groups (i.e. where the samples used for each L2FC are the same), it ends up being:
$ n/k * \sum{i=1}^{k} ((SE{xi})^2)$
In other words, add up the squares of the standard errors, then multiply by whatever n is used in the standard error calculation, which I assume to be the total number of samples in both conditions (e.g. with 5 replicates vs 4 replicates, divide by 3), then divide by the number of L2FC values. If the L2FC values are essentially a difference-of-means test, then I see that fishing an n out of the standard error doesn't make much sense... which ties me up in knots.
Edit: here's what I've ended up with for now. I'm using the mean of L2FC values for the central value, and this function for determining the error :
l2fcAll.se <- tapply(sub.se.tbl.genome$lfcSE, sub.se.tbl.genome$comparison,
function(x){sqrt(sum(x^2) / (length(x)))});
In other words, the square root of the mean square of standard error.
I'm not confident in that (I've changed the formula a few times over the last couple of hours), and I'd like to know if anyone has any other / better ideas on what to do about this.
Thanks heaps.