Limma-voom with inverse probability weights (similar to sample weights)
Entering edit mode
jay • 0
Last seen 4 weeks ago

I am performing analyses with observational data where treatment assignment is not at random. To avoid selection bias a common approach is to use inverse probability weights (IPW) in order to balance the groups. Weights w are calculated using w=(1 - d) + d*(e/(1 - e)), where e is the propensity score (PS) of receiving the treatment derived from the fitted values of a parametric glm model on a set of covariates X, and d equals 1 for the treated and 0 for the control.

In a usual regression we can either use A. the weights, B. the propensity score as a covariate without weights, or C. just the covariates, which usually gives very similar results. Expressed in code it looks like this:

lm(y ~ d, weights=w)  ## A. using IPW weights
lm(y ~ d + e)  ## B. using PS
lm(y ~ d + X)  ## C. using covariates

I am now aiming at applying this to mRNA analysis using a limma/voom approach. I was suggested to multiply voom weights with the IPW weights w which seems reasonable and is basically what this post also suggests, i.e.:

keep <- edgeR::filterByExpr(count, design, min.count=5, min.prop=0.05)
v <- voom(count[keep,,drop=FALSE], design, weights=w)
fit <- lmFit(v, design, weights = v$weights*w) |> limma::eBayes(robust=TRUE)

This seems to work well. However, if I redo the analysis using method B., with e as a covariate resp. as the design, or C. using the covariates X as the design I surprisingly get very different results using the limma/voom approach.

I expected the results to be similar as they are using ordinary regression analyses, but that is not the case. More specific I get a much higher number of differential expressed genes using w in method A. compared to the other two methods B. and C. Therefore, I am currently not sure that the whole approach is correct and without further ado is applicable to limma/voom. I am aware that we test a large number of hypotheses, which might interfere with the methods differently.

I would be glad if anyone with appropriate statistical knowledge could have a look at this. My questions are, 1. whether the shown pipeline edgeR/voom/lmFit/eBayes is appropriate for my type of analysis, and 2. why the results of methods A., B. and C. might be so different using limma/voom compared to results of ordinary regressions.

The goal is to gain more confidence in the results, which is achieved in ordinary regressions by comparing the results of methods A-C.


According to comments, limma/voom night not be the right approach for sample weights. So 3. which approach is better for sample weights?

limma limma-voom StatisticalMethod • 142 views
Entering edit mode
Last seen 5 minutes ago
WEHI, Melbourne, Australia

One point is that you are not multiplying the weights correctly. w is an n-vector of sample weights and v$weights is a G x n matrix of precision weights so, if you want to multiply them, you would need to use

t( t(v$weights) * w)

or equivalently

v$weights * matrix(w, nrow(v), ncol(v), byrow=TRUE)

The previous post you link to was different because the previous poster had a matrix of precision weights whereas you have a vector of sample weights.

However I have doubts about the IPW weighting approach. I am not familiar with IPW, but IPW weights are designed to correct biases in the linear model rather than to model the variance. Introducing IPW weights to limma seems to me likely to produce biased variance estimators and therefore play havoc with limma's empirical Bayes variance moderation and with assessment of differential expression.

I don't see how methods A, B and C could be equivalent, even for ordinary univariate regression. However, if B or C are valid methods, then my guess would be that they might work better with limma than the weighting approach. That's just a guess -- I don't have any experience with IPW.

Entering edit mode

Thanks, t(t(v$weights)*w) makes sense! Yes exactly, I use sample weights aka importance weights; so thanks for supporting my doubts. I didn't say they are the same, but A and B are usually very similar, and both should be superior to C, which is actually the point of the method. So would you suggest a pipeline different to limma's empirical Bayes variance?

Entering edit mode

The potential problem isn't with empirical Bayes with the fact that the variance estimate from your method A linear model might not be meaningful. If that is a problem, then dropping the empirical Bayes won't solve it. I don't know anything about IPW so can't advise you how to implement it. I can only answer questions on limma syntax.


Login before adding your answer.

Traffic: 437 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6