Hi,

I am using limma-voom for a RNA-Seq differential expression analysis. So there are 260 samples from leukaemia patients and they fall into different clusters except for some 40 samples. So let's say I have clusters A to H, each contains 5 - 40 samples, and then 40 samples that don't fall into any clusters (I'll call them "others" for convenience' sake). I want to see how the expression of each cluster is compare to the rest of the cohort, e.g. cluster A vs (clusters B to H + others) and run through A to H. Which way should I use to make the design matrix (1,2 or 3 below)?

And as I need to do sva to adjust for batch effect, should I use 1, 2 or 3 for `svaseq()`

as the" mod1"?

EDITED: since I have made some mistakes which makes the question sounds like nonsense I have changed the questions

For each cluster I make a clusterX and not_clusterX vector, which will be like:
Let's say 1st to 5th are A and there are 45 samples...
```
clusterA = rep(c(TRUE,FALSE),times = c(5,40))
not_clusterA = !rep(c(TRUE,FALSE),times = c(5,40))
```

... 6th to 10th are B ...
```
clusterB = rep(c(FALSE,TRUE,FALSE),times = c(5,5,35))
not_clusterB = !rep(c(TRUE,FALSE),times = c(5,40))
```

... and so on and so forth

Only mark which samples belong to which cluster and ignore the "others", which would be something like this (just an example)

`model.matrix(~clusterA + clusterB + clusterC + clusterD + clusterE + clusterF + clusterG + clusterH)`

Make an extra column for each cluster as negative For convenience' sake, again, clusterX are factors which labelled which samples are in the corresponding clusters

`model.matrix(~clusterA + clusterB + clusterC + clusterD + clusterE + clusterF + clusterG + clusterH`

`+ not_clusterA + not_clusterB + not_clusterC + not_clusterD + not_clusterE + not_clusterF + not_clusterG + not_clusterH)`

Make one vector only Group everything in one column

`cluster = rep(c("A","B","C","D","E","F","G","H",NA), times = rep(5,9))`

`model.matrix(~cluster)`

I actually tried 1 and 3 and it seems it doesn't work with contrast matrix since that will need a -1 as contrast to 1, but I am not sure if 2 makes sense and if this will affect the way voom or sva estimate the model.

Thanks a lot!

What do the vectors

`clusterA`

,`clusterB`

etc contain?Sorry I have reformulated the questions. It should make more sense now.

Sorry I have reformulated the questions so it makes more sense