Question

Limma prametrization

1

Entering edit mode

Giovanni • 0

@73e8d7f2

Last seen 3.9 years ago

Italy

Hello, I need to compare the performance of a model based on a fully Bayesian approach with the model implemented in the limma package. My data looks like this (protein-wise):

sub.1 healthy tissue.1
sub.1 healthy tissue.2
sub.1 healthy tissue.3
sub.1 diseased tissue.1
sub.1 diseased tissue.2
sub.1 diseased tissue.3
sub.2 healthy tissue.1
sub.2 healthy tissue.2
sub.2 healthy tissue.3
sub.2 diseased tissue.1
sub.2 diseased tissue.2
sub.2 diseased tissue.3

My model likelihood its expressed in this parametrization (where j cycles proteins, l cycles the tissue, s cycles the health status and i cycles the subjects):

y_jlsi = mu_jl + delta_i + theta_jls + epsilon_jlsi

where:

mu_jl is a mixed effect composed of a fix part caused by the protein and a random effect based on the tissue
delta_i is a random effect due to the subject
theta_jls is the effect of interest
epsilon_jlsi is the random residual

To adapt this model to limma package i reparameterized the model in this way:

y_jlsi = xi_j + gamma_li + theta_jls + epsilon_jlsi

now:

xi_j it's the fixed effect of mu_jl
gamma_li it's the sum of the random part of mu_jl and delta_i

From a statistical point of view, these models are equivalent in theta_jls estimation, but now I should be able to estimate the 2nd parametrization with the R code posted below.

I am also willing to search for literature to make sure alone that my assumptions are correct. However, I have difficulty in looking for theoretical papers of the statistical model.

The literature I have found and read is the following:

[1] Ritchie, Matthew E., et al. "limma powers differential expression analyses for RNA-sequencing and microarray studies." Nucleic acids research 43.7 (2015): e47-e47. [2] Smyth, Gordon K., et al. "Linear models for microarray and RNA-Seq data user’s guide." R, doi 10 (2015)

> Array  = matrix(data[,"value"], ncol=60, nrow=30, byrow= TRUE)
> 
> fix    = as.factor(paste(data$state[1:60],data$location[1:60], sep="."))
> fix
 [1] healthy.loc.1  healthy.loc.2  healthy.loc.3  diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1 
 [8] healthy.loc.2  healthy.loc.3  diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1  healthy.loc.2 
[15] healthy.loc.3  diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1  healthy.loc.2  healthy.loc.3 
[22] diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1  healthy.loc.2  healthy.loc.3  diseased.loc.1
[29] diseased.loc.2 diseased.loc.3 healthy.loc.1  healthy.loc.2  healthy.loc.3  diseased.loc.1 diseased.loc.2
[36] diseased.loc.3 healthy.loc.1  healthy.loc.2  healthy.loc.3  diseased.loc.1 diseased.loc.2 diseased.loc.3
[43] healthy.loc.1  healthy.loc.2  healthy.loc.3  diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1 
[50] healthy.loc.2  healthy.loc.3  diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1  healthy.loc.2 
[57] healthy.loc.3  diseased.loc.1 diseased.loc.2 diseased.loc.3
Levels: diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1 healthy.loc.2 healthy.loc.3
> block = paste(data$id[1:60],data$location[1:60], sep=".")
> block
 [1] "sub.1.loc.1"  "sub.1.loc.2"  "sub.1.loc.3"  "sub.1.loc.1"  "sub.1.loc.2"  "sub.1.loc.3"  "sub.2.loc.1" 
 [8] "sub.2.loc.2"  "sub.2.loc.3"  "sub.2.loc.1"  "sub.2.loc.2"  "sub.2.loc.3"  "sub.3.loc.1"  "sub.3.loc.2" 
[15] "sub.3.loc.3"  "sub.3.loc.1"  "sub.3.loc.2"  "sub.3.loc.3"  "sub.4.loc.1"  "sub.4.loc.2"  "sub.4.loc.3" 
[22] "sub.4.loc.1"  "sub.4.loc.2"  "sub.4.loc.3"  "sub.5.loc.1"  "sub.5.loc.2"  "sub.5.loc.3"  "sub.5.loc.1" 
[29] "sub.5.loc.2"  "sub.5.loc.3"  "sub.6.loc.1"  "sub.6.loc.2"  "sub.6.loc.3"  "sub.6.loc.1"  "sub.6.loc.2" 
[36] "sub.6.loc.3"  "sub.7.loc.1"  "sub.7.loc.2"  "sub.7.loc.3"  "sub.7.loc.1"  "sub.7.loc.2"  "sub.7.loc.3" 
[43] "sub.8.loc.1"  "sub.8.loc.2"  "sub.8.loc.3"  "sub.8.loc.1"  "sub.8.loc.2"  "sub.8.loc.3"  "sub.9.loc.1" 
[50] "sub.9.loc.2"  "sub.9.loc.3"  "sub.9.loc.1"  "sub.9.loc.2"  "sub.9.loc.3"  "sub.10.loc.1" "sub.10.loc.2"
[57] "sub.10.loc.3" "sub.10.loc.1" "sub.10.loc.2" "sub.10.loc.3"
> 
> design = model.matrix(~0+fix)
> 
> 
> rand   = duplicateCorrelation(object = Array,
+                               design = design,
+                               block  = block)
> 
> fit <- lmFit(Array, design, correlation = rand$consensus.correlation)
> 
> cont.dif <- makeContrasts(loc.1 = (fixdiseased.loc.1-fixhealthy.loc.1),
+                           loc.2 = (fixdiseased.loc.2-fixhealthy.loc.2),
+                           loc.3 = (fixdiseased.loc.3-fixhealthy.loc.3),
+                           levels = design)
> 
> fit2 <- contrasts.fit(fit, cont.dif)
> fit3 <- eBayes(fit2)

limm MathematicalBiology limmaGUI StatisticalMethod • 1.6k views

ADD COMMENT • link 4.0 years ago Giovanni • 0

score 4 · Accepted Answer · 2022-01-17

4

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 19 hours ago

WEHI, Melbourne, Australia

mu_jl is a mixed effect composed of a fix part caused by the protein and a random effect based on the tissue

Well that's your choice but my approach would be different. In my opinion it isn't appropriate or necessary to treat tissue as a random effect if you are profiling the same systematic set of tissue types from all patients.

From a statistical point of view, these models are equivalent in theta_jls estimation

The two models are obviously different, not merely reparametrizations of one another. The first model has more random effects while the second model has more fixed effects.

The model you have fitted in limma is not correct. It is not correct to include the same factor (location in this case) in both the design matrix and in the blocking variable, i.e., trying to treat the same factor as both a fixed and a random effect. You should simply specify a patient random effect by block=data$id.

I am also willing to search for literature to make sure alone that my assumptions are correct. However, I have difficulty in looking for theoretical papers of the statistical model.

The mathematical methods implemented in limma are described in published research papers. The appropriate papers are cited in Ritchie et al (2015) and in the limma User's Guide. In addition, every function in the limma package has a help page (for example ?eBayes or ?duplicateCorrelation) which cites the theoretical paper or papers relevant to that function.

ADD COMMENT • link 4.0 years ago Gordon Smyth 53k

0

Entering edit mode

Thank you very much for your answer,

I proceed by section:

Well that's your choice but my approach would be different. In my opinion, it isn't appropriate or necessary to treat tissue as a random effect if you are profiling the same systematic set of tissue types from all patients.

Thanks for the advice! we reflect on the structure of the both models!

The two models are obviously different, not merely reparametrizations of one another. The first model has more random effects while the second model has more fixed effects.

This is partially true (if I'm correct). My idea is that (under normality assumption) of the effects, we can decompose the mixed effect mu_jl distributed as N (xi_j, tau ^ 2) in its fixed part, and in its random part now written as zeta_l, distributed as N (0, tau ^ 2). now (mainly to adapt it to the functions of the limma package) I have combined the effect in delta_i ( N(0,lamnda^2)) with the effect in zeta_l creating that gamma_li (0, lamda^2+ tau^2) and the model posted above. Clearly, the variances will no longer be identifiable but these two models should be comparable on the estimates of the parameters of interest theta_jls and in particular each difference between theta_jl1 and theta_jl2 or the increase in the concentration of the protein in the inflamed tissue.
If my R code is wrong (the performance on simulations seems to work well) how do I represent one of these statistical models with limma? My main problem with all is how to express 2 different random effects in the model.

?eBayes

here I found a new paper, I am continuing my study!

thanks again for your help

ADD REPLY • link 4.0 years ago Giovanni • 0

1

Entering edit mode

limma only fits one random effect term and there isn't any way to make it do otherwise. Estimating a random effect for an combined blocking variable is not equivalent to estimating the sum of the individual variance components, especially when one of the variance components isn't identifiable anyway.