Design matrix for cohort with samples outside "block"
0
0
Entering edit mode
kentfung • 0
@kentfung-17051
Last seen 15 months ago

Hi,

I am using limma-voom for a RNA-Seq differential expression analysis. So there are 260 samples from leukaemia patients and they fall into different clusters except for some 40 samples. So let's say I have clusters A to H, each contains 5 - 40 samples, and then 40 samples that don't fall into any clusters (I'll call them "others" for convenience' sake). I want to see how the expression of each cluster is compare to the rest of the cohort, e.g. cluster A vs (clusters B to H + others) and run through A to H. Which way should I use to make the design matrix (1,2 or 3 below)?

And as I need to do sva to adjust for batch effect, should I use 1, 2 or 3 for svaseq() as the" mod1"?

EDITED: since I have made some mistakes which makes the question sounds like nonsense I have changed the questions

For each cluster I make a clusterX and not_clusterX vector, which will be like: Let's say 1st to 5th are A and there are 45 samples... clusterA = rep(c(TRUE,FALSE),times = c(5,40)) not_clusterA = !rep(c(TRUE,FALSE),times = c(5,40))

... 6th to 10th are B ... clusterB = rep(c(FALSE,TRUE,FALSE),times = c(5,5,35)) not_clusterB = !rep(c(TRUE,FALSE),times = c(5,40))

... and so on and so forth

1. Only mark which samples belong to which cluster and ignore the "others", which would be something like this (just an example) model.matrix(~clusterA + clusterB + clusterC + clusterD + clusterE + clusterF + clusterG + clusterH)

2. Make an extra column for each cluster as negative For convenience' sake, again, clusterX are factors which labelled which samples are in the corresponding clusters model.matrix(~clusterA + clusterB + clusterC + clusterD + clusterE + clusterF + clusterG + clusterH + not_clusterA + not_clusterB + not_clusterC + not_clusterD + not_clusterE + not_clusterF + not_clusterG + not_clusterH)

3. Make one vector only Group everything in one column cluster = rep(c("A","B","C","D","E","F","G","H",NA), times = rep(5,9)) model.matrix(~cluster)

I actually tried 1 and 3 and it seems it doesn't work with contrast matrix since that will need a -1 as contrast to 1, but I am not sure if 2 makes sense and if this will affect the way voom or sva estimate the model.

Thanks a lot!

limma voom sva • 237 views
0
Entering edit mode

What do the vectors clusterA, clusterB etc contain?

0
Entering edit mode

Sorry I have reformulated the questions. It should make more sense now.

0
Entering edit mode

Sorry I have reformulated the questions so it makes more sense