I am performing analyses with observational data where treatment assignment is not at random. To avoid selection bias a common approach is to use inverse probability weights (IPW) in order to balance the groups. Weights *w* are calculated using `w=(1 - d) + d*(e/(1 - e))`

, where *e* is the propensity score (PS) of receiving the treatment derived from the fitted values of a parametric `glm`

model on a set of covariates *X*, and *d* equals *1* for the treated and *0* for the control.

In a usual regression we can either use **A.** the weights, **B.** the propensity score as a covariate without weights, or **C.** just the covariates, which usually gives very similar results. Expressed in code it looks like this:

```
lm(y ~ d, weights=w) ## A. using IPW weights
lm(y ~ d + e) ## B. using PS
lm(y ~ d + X) ## C. using covariates
```

I am now aiming at applying this to mRNA analysis using a `limma/voom`

approach. I was suggested to multiply `voom`

weights with the IPW weights `w`

which seems reasonable and is basically what this post also suggests, i.e.:

```
keep <- edgeR::filterByExpr(count, design, min.count=5, min.prop=0.05)
v <- voom(count[keep,,drop=FALSE], design, weights=w)
fit <- lmFit(v, design, weights = v$weights*w) |> limma::eBayes(robust=TRUE)
```

This seems to work well. However, if I redo the analysis using method **B.**, with *e* as a covariate resp. as the design, or **C.** using the covariates *X* as the design I surprisingly get very different results using the `limma/voom`

approach.

I expected the results to be similar as they are using ordinary regression analyses, but that is not the case. More specific I get a much higher number of differential expressed genes using `w`

in method **A.** compared to the other two methods **B.** and **C.** Therefore, I am currently not sure that the whole approach is correct and without further ado is applicable to `limma/voom`

. I am aware that we test a large number of hypotheses, which might interfere with the methods differently.

I would be glad if anyone with appropriate statistical knowledge could have a look at this. My questions are, **1.** whether the shown pipeline `edgeR/voom/lmFit/eBayes`

is appropriate for my type of analysis, and **2.** why the results of methods **A.**, **B.** and **C.** might be so different using `limma/voom`

compared to results of ordinary regressions.

The goal is to gain more confidence in the results, which is achieved in ordinary regressions by comparing the results of methods **A-C**.

### Edit:

According to comments, `limma/voom`

night not be the right approach for sample weights. So **3.** which approach is better for sample weights?

Thanks,

`t(t(v$weights)*w)`

makes sense! Yes exactly, I use sample weights aka importance weights; so thanks for supporting my doubts. I didn't say theyarethe same, but A and B are usually very similar, and both should be superior to C, which is actually the point of the method. So would you suggest a pipeline different to limma's empirical Bayes variance?The potential problem isn't with empirical Bayes with the fact that the variance estimate from your method A linear model might not be meaningful. If that is a problem, then dropping the empirical Bayes won't solve it. I don't know anything about IPW so can't advise you how to implement it. I can only answer questions on limma syntax.