Hello, Bioconductor, I'm working lately with data from small-RNA seq in an organism without small RNA annotation. I would like to perform DE analysis on Differentially Expresed Regions and I would like to ask about the creation of the model matrix. Following the example of the vignette :

## Build models

```
mod <- model.matrix(~ pheno$group + pheno$gender)
mod0 <- model.matrix(~ pheno$gender)
```

I only have `pheno$group`

, does that mean I just follow the "main" instructions from the limma vignette?

```
getF <- function(fit, fit0, theData) {
rss1 <- rowSums((fitted(fit) - theData)^2)
df1 <- ncol(fit$coef)
rss0 <- rowSums((fitted(fit0) - theData)^2)
df0 <- ncol(fit0$coef)
fstat <- ((rss0 - rss1) / (df1 - df0)) / (rss1 / (ncol(theData) - df1))
f_pval <- pf(fstat, df1 - df0, ncol(theData) - df1, lower.tail = FALSE)
fout <- cbind(fstat, df1 - 1, ncol(theData) - df1, f_pval)
colnames(fout)[2:3] <- c("df1", "df0")
fout <- data.frame(fout)
return(fout)
}
```

If I want to perform the single base-level F-statistics, instead of using the:

```
models <- makeModels(sampleDepths,
testvars = pheno$group,
adjustvars = pheno[, c("gender")]
)
```

Do I put the `testvars`

only?

Thank you in advance!

Konstantinos

Dear Leonardo,

in the section

of the derfinder vignette it is explained how to perform an analysis in the case that the experimental design has a covariate ("6.1.2 Find DERs with limma"`pheno[, c("gender")]`

)I wanted to know if in a simpler design (such as in the example with only the

`pheno$group`

) the limma approach would be as the one in the 9.2 or 9.3 of the "limma users guide":## 9.3 Several Groups

For example, a design I would like to test is:

Does that mean I have to follow this:

Or should I find a way to apply the

`getF()`

proposed in the derfinder vignette?Yes, that was what I was searching for:

I wasn't completely sure if using the

`sampleDepths`

and the`pheno$group`

would be sufficient to get results without the input of adjustvars.Hi,

Thanks for the clarifications. The single base-level F-statistics approach is quite slow, and well, I would recommend that you use the limma-based approach instead. With

`limma`

you can then use whatever model is best for your question and your question really becomes a modeling question instead of a`derfinder`

one.I see that

`targets$group`

has three options:`Ctrl`

,`C1`

and`C2`

. If you use the`model.matrix(~0 + targets$group)`

approach, you are then modeling without an intercept. If you use`model.matrix(~ targets$group)`

one of your groups will become the reference. You can visually inspect this with`ExploreModelMatrix`

with:versus

That visual inspection will help you understand what are the coefficients that you are obtaining and how to interpret them.

If you are interested in the difference between

`C1`

and`Ctrl`

, as well as the difference of`C2`

vs`Ctrl`

, then you might want to use`model.matrix(~targets$group)`

where you set up`targets$group`

to be a`factor`

with`levels(targets$group)`

to be`c("Ctrl", "C1", "C2")`

, then get the t-statistics for the 2nd coefficient (C1 vs Ctrl) and the 3rd coefficient (C2 vs Ctrl) by running`limma:topTable(coef = 2)`

and`limma::topTable(coef = 3)`

on the same`eBayes(lmFit())`

object.Or you can use the

`model.matrix(0 ~ targets$group)`

model that has no intercept and use the`constrasts.fit()`

function like they explain in the`limma`

docs.You might be able to get more specific modeling support by asking the

`limma`

authors.Best, Leo

Thank you much for the elucidation, it is clear now and I will proceed as you suggested!

Kind regards, Konstantinos