Entering edit mode
Aaron Mackey
▴
200
@aaron-mackey-3833
Last seen 9.9 years ago
We have a project with a somewhat spectacular design: two cell types
in
trans-well culture, each cell population from a human donor. In each
experimental setup, we use one set of donors, treated with ~10
different
perturbagens performed as a single replicate, plus a vehicle control
performed in duplicate; we repeat this setup independently 5 times
with 5
different sets of donors (same perturbagens). Then, in another batch,
we
repeat the entire process with a new set of perturbagens (again with
vehicle controls, though same donors as before). At the end of the
day (or
months, as it may be), we have many hundreds of samples across ~8
batches,
5 sets of cell donors, and ~80 different perturbagens, and we'd of
course
like to test for the effects of each individual perturbagen, and
compare/contrast effects across/between perturbagens.
Using edgeR, voom, and DESeq we have analyzed this data two ways
1) using all of the samples, accounting for a variety of
intermingled
batch and donor effects, and modeling all 80 perturbagens
simultaneously
2) using only the small subset of samples relevant to a given
perturbagen
(i.e. the 5 treated samples and their 10 matching vehicle controls),
and
modeling only 4 donor effects.
When we look for subset of known/expected biological signals across
well-understood perturbagens, we are surprised to find that approach
#2
performs much better than #1 (both with respect to sensitivity at any
given
FDR, and with respect to statistical ranking, regardless of FDR),
despite
our prior expectation that #2 will lack power compared to #1 (which
has
many more df than #2, and thus much more accurate tag-level dispersion
estimates). Our working hypothesis is that the additional batch
corrections required for #1 introduce enough additional uncertainty
that
while the same "correct" fold changes are observed by each approach,
statistical significance is rarely achieved with #1.
A second concern is that our dataset as a whole violates the the
constant
dispersion assumption inherent to all of the methods, and that by
refitting
each subset of the data in #2 we are in effect relaxing that
assumption; if
so, I wonder if it would be useful to consider using the whole-dataset
dispersion estimates from #1 as priors when refitting the subsets, to
regain some of the expected power.
A third issue with both approaches is that the same vehicle controls
are
reused for multiple perturbagens within a batch, leading to a nested
correlation structure not accounted for in either of the approaches;
the
net effect of this is that the estimated perturbagen-induced fold
changes
within each batch are correlated with each other, leading to
difficulties
when comparing responses within and between batches. I presume we
could
try to address this in the voom pipeline using duplicateCorrelation()
to
get a mixed model, but this doesn't help us with edgeR or DEseq.
In sum, I'm sure we're just rediscovering well known modeling
pitfalls, but
we'd appreciate any comments or suggestions. We are currently testing
whether modifying approach #1 to include only donor-level batch
coefficients will recover some accuracy, but any further ideas would
be
appreciated.
Thanks,
-Aaron
[[alternative HTML version deleted]]