Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.6 years ago
I have recently implemented the approach used in voom to estimate the
mean and the variance of each log-cpm at the observational level. My
dataset contains ~1000 samples, that features a discrete amount of
metadata that may be used as covariates (~400). This allows, in
principle, for a better construction of the linear model on which both
the fitted mean and the fitted variance are estimated in voom, by
simply including more factors.
So far, I have used the AIC weights to test the probability for
various linear models to be more likely to explain the data than the
alternative models. Of course, testing all possible combinations of
linear models is computationally infeasible (in principle, 2^400).
However, even if I detected most gene are well explained by a simple
LM, a non negligible fraction of them depend on additional factors.
The point is the what makes the expression profile of a certain gene
interesting, is when the covariates play an important role in
determining its mean and variance. Therefore I am reluctant to use the
simple LM because this would eliminate all the covariates. On the
other hand, I am reluctant to use to more complicated LM because it
clearly unnecessarily fits a large amount of genes.
What is the best way to proceed?
Thanks!
-- output of sessionInfo():
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.4.2 limma_3.18.9
loaded via a namespace (and not attached):
[1] tools_3.0.2
--
Sent via the guest posting facility at bioconductor.org.