I am trying to perform some linear regressions with respect to some covariates in limma so that I can predict expression values after controling for 2 covariates, `Afactor`

and `Efactor`

. Basically, this is my sample setup:

```
Afactor Efactor
sample1 a1 e2
sample2 a0 e1
sample3 a0 e2
sample4 a1 e0
sample5 a1 e0
sample6 a1 e1
...
```

Here, `Afactor`

is a factor with 2 unique values `a1`

and `a2`

, and `Efactor`

is a factor with 3 unique values `e1`

, `e2`

, `e3`

. To fit it, I'm using limma as follows:

```
design=model.matrix('~0 + Afactor + Efactor', my_metadata)
fit= lmFit(my_data, design)
```

This allows me to find the fitting coefficients with `fit$coefficients`

, that resemble the following:

```
>head(fit$coefficients)
Afactora1 Afactora2 Efactore1 Efactore2
gene1 2.3 3.7 -1.3 0.9
gene2 1.1 2.1 -2.5 2.1
gene3 1.3 3.1 0.2 1.3
```

My questions:

- Note that there is no coefficient for
`Efactore3`

, but why is that so? - Does this have to do with the fact that possible factors for Efactor are
`e1`

,`e2`

and`e3`

? - Shouldn't it be possible to extract the coefficient for
`e3`

so that I could use each coefficient to estimate gene expression from the fitted model by imputing the values of`Afactor`

and`Efactor`

for each sample?

Hi James, many thanks for your comments and help!

You are correct indeed. I had edited my post and changed the levels in

`Afactor`

to`a0`

and`a1`

, and updated the post to say (mistakenly)`a1`

and`a2`

. I should've had`Afactora0`

and`Afactora1`

instead.That had been my interpretation of the coefficients actually: that this would be a fitting coefficient, such that when replacing some given factor levels in it (e.g. sample1

`a1`

`e2`

) I would be able to predict the gene expression value as given by the model, and that the coefficient value would represent the mean of all values in that group.The last bit of your sentence was actually why I thought of avoiding the intercept altogether, in order to facilitate the interpretation of the coefficients. Your suggestion of combining the factor levels is quite interesting (though it may pose a challenge if I include many covariates!). Am I correct in assuming the remaining steps in retrieving the adjusted gene expression is by replacing the sample values as I mentioned above?

I had googled a bit for this question and wasn't able to find much, though I'd think this is a common question (though I did find this). I do find many controlling for covariates when contrasting effects

It's not super clear to me exactly what you are after. Maybe you just want covariate-adjusted values? In which case

`removeBatchEffect`

will do that for you. But do note (as pointed out in the help page for that function) that it's not intended as something you do with your data prior to fitting your 'real' model, which would artificially increase your degrees of freedom.Many thanks James. My aim is exactly to adjust the values for these covariates. In this case I do not want to remove any batch effect (beyond the effect of the covariates), so would you encode all samples with the same batch id to enter in

`batch1=some_batch`

? In practice, is this the same as employing the`lmFit`

approach above and using the residuals of the model (`residuals.MArrayLM`

) for further analysis?I will accept your answer above, but feel free to edit it with any information if you feel it may be useful to others.

Once again many thanks!

Please read the help page for

`removeBatchEffect`

a bit more closely. There are more arguments than the batch arguments!And yes, it's just fitting the model using the covariates and returning the residuals.