Hello, I have a question about model.matrix() while I'm doing some analysis. I have counts and coldata variables. As you know, the counts column has about 15 samples, with 7 being "Ctrl" and 8 being "Treat." The rows of counts represent gene lists with expression levels. The coldata variable has a column called "disease" and another column called "new_variance," with 15 samples as rows. The "disease" column has 7 "Ctrl" and 8 "Treat" labels, and the "new_variance" column has 15 random values (e.g., 1.434, 1.5, 0.989, ...).
I'm having trouble understanding the meaning of the following three cases of
# 1. disease mm1 = model.matrix(disease, coldata) ddsMat <- DESeqDataSetFromMatrix(counts, coldata, design = ~ 1) ddsMat2 = DESeq(ddsMat, full = mm1, betaPrior = FALSE) # 2. new_variance mm2 = model.matrix(~new_variance, coldata) ddsMat3 <- DESeqDataSetFromMatrix(counts, coldata, design = ~ 1) ddsMat4 = DESeq(ddsMat3, full = mm2, betaPrior = FALSE) # 3. interaction mm3 = model.matrix(~new_variance*disease, coldata) ddsMat5 <- DESeqDataSetFromMatrix(counts, coldata, design = ~ 1) ddsMat6 = DESeq(ddsMat5, full = mm3, betaPrior = FALSE)
In the above examples, I can easily understand that 1. disease is a typical case study in RNA-seq analysis, where we can predict the Fold Change as treat/control because the "disease" column in coldata clearly distinguishes "Ctrl" and "Treat."
However, for 2. new_variance, can we distinguish "Ctrl" and "Treat"? Furthermore, I'm not clear on how to interpret the interaction in 3. What I'm expecting is that the meaning of RNA-seq analysis might change a bit. I thought that if I use the "new_variance" column, it's not a typical case study. So, when applying a new metric to RNA-seq analysis, what does it mean, and how should I interpret the statistical analysis results?
I've looked at "Analyzing RNA-seq data with DESeq2," but I didn't quite understand it. If you could help me understand what I'm missing, I would be really grateful. In the cases of 2 and 3, what is the meaning of the Fold Change that appears in the statistical analysis results? If there's a correct answer, and if I've missed something, providing a reference link would be great. Thank you.