Hi everyone,
I have a bit of a problem with DESeq2 which I have already seen described and answered before, but I think I’m missing some key information to correctly tackle it. To summarise the experiment, I have a set of samples from patients that either respond well to a certain treatment (R) or don’t respond well (NR), and samples were obtained from each patient at up to three different (t0, t1 and t2). Each sample obtained was subdivided into cells positive for a certain cell surface marker (pos) and negative for that marker (neg). Right now, I am only interested in comparing gene expression between two discrete groups: between responders and non-responders, only at t0 and of positive cells, at t0 of negative cells, at t1 of positive cells, and at t1 of negative cells. For this, I created a column which nests all these features into a single definition, with these groups being R_pos_t0, NR_pos_t0, R_neg_t0, NR_neg_t0, R_pos_t1 and NR_pos_t1, R_neg_t1 and NR_neg_t1 (as well as R_pos_t2 and R_pos_t2 which won’t be compared to non-responders since there are no t2 non-responder samples) so it should be quite straight forward. However, I now have quite a few more variables I wish to add in order to control for these variables that might affect response. These variables are: percentage positivity for another marker (pdl1), sex, age, smoking habits (smoke), ecog, stage of disease, histology of disease, previous treatment (chemo), recist, and type of current treatment (treatment).
I understand that the correct model design would be the following:
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ condition + condition:pdl1 + condition:sex + condition:age + condition:smoke + condition:ecog + condition:stage + condition:histology + condition:chemo + condition:recist + condition:treatment)
However, I get to a problem which I have already seen described, which is that the model cannot be used because the resulting model is not full rank since it produces columns with zero values due to all the interactions. I have read from responses here that it is possible to create a custom model matrix and feed it to DESeq2 after eliminating zero columns, which I assume would be done like this:
mm = model.matrix(~ condition + condition:pdl1 + condition:sex + condition:age + condition:smoke + condition:ecog + condition:stage + condition:histology + condition:chemo + condition:recist + condition:treatment, colData(dds))
mm <- mm[, colSums(mm == 0) != nrow(mm)]
dds = DESeq(dds, full=mm, betaPrior=FALSE)
This would allow to first create the custom matrix with all interactions, then eliminate all zero columns from said matrix, and then feed the matrix to DESeq2. However, what I am missing is how do I get to this point. How can I create a custom model matrix that requires colData information from a dds object that in turn hasn’t been generated due to the non zero values in the columns? This is the step that I am missing, creating the model matrix if DESeqDaraSetFromMatrix has not been able to generate a dds object. It’s probably something very silly that I am missing but it is something that I cannot find or that I’m not looking for properly.
Thank you very much in advance for any responses!
Kind regards Jesús
Hi Michael! Thank you very much for your response. I tried what you said, specifically setting design as ~ 1, like so:
This way, it does let me create the model matrix and then feed it to DESeq2. However, I receive the error message saying that the model is not full rank: “Error in checkFullRank(full) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed”. This is strange as none of the variables I included are linear combinations of other. What else could be going on?
I'd recommend consulting a statistician or someone familiar with linear models to help you get the design specified. There's nothing particular here about DESeq2, you just need to find out how to express your analysis as a design formula or matrix. I unfortunately don't have sufficient time to provide this kind of statistical consultation on the support site.
Hi Michael. I understand, thank you for your help