I am using limma-voom for a RNA-Seq differential expression analysis. So there are 260 samples from leukaemia patients and they fall into different clusters except for some 40 samples. So let's say I have clusters A to H, each contains 5 - 40 samples, and then 40 samples that don't fall into any clusters (I'll call them "others" for convenience' sake). I want to see how the expression of each cluster is compare to the rest of the cohort, e.g. cluster A vs (clusters B to H + others) and run through A to H. Which way should I use to make the design matrix (1,2 or 3 below)?
And as I need to do sva to adjust for batch effect, should I use 1, 2 or 3 for
svaseq() as the" mod1"?
EDITED: since I have made some mistakes which makes the question sounds like nonsense I have changed the questions
For each cluster I make a clusterX and not_clusterX vector, which will be like:
Let's say 1st to 5th are A and there are 45 samples...
clusterA = rep(c(TRUE,FALSE),times = c(5,40))
not_clusterA = !rep(c(TRUE,FALSE),times = c(5,40))
... 6th to 10th are B ...
clusterB = rep(c(FALSE,TRUE,FALSE),times = c(5,5,35))
not_clusterB = !rep(c(TRUE,FALSE),times = c(5,40))
... and so on and so forth
Only mark which samples belong to which cluster and ignore the "others", which would be something like this (just an example)
model.matrix(~clusterA + clusterB + clusterC + clusterD + clusterE + clusterF + clusterG + clusterH)
Make an extra column for each cluster as negative For convenience' sake, again, clusterX are factors which labelled which samples are in the corresponding clusters
model.matrix(~clusterA + clusterB + clusterC + clusterD + clusterE + clusterF + clusterG + clusterH
+ not_clusterA + not_clusterB + not_clusterC + not_clusterD + not_clusterE + not_clusterF + not_clusterG + not_clusterH)
Make one vector only Group everything in one column
cluster = rep(c("A","B","C","D","E","F","G","H",NA), times = rep(5,9))
I actually tried 1 and 3 and it seems it doesn't work with contrast matrix since that will need a -1 as contrast to 1, but I am not sure if 2 makes sense and if this will affect the way voom or sva estimate the model.
Thanks a lot!