The meaning of the analysis results when using a new metric in analysis
1
0
Entering edit mode
SH • 0
Last seen 4 days ago
South Korea

Hello, I have a question about model.matrix() while I'm doing some analysis. I have counts and coldata variables. As you know, the counts column has about 15 samples, with 7 being "Ctrl" and 8 being "Treat." The rows of counts represent gene lists with expression levels. The coldata variable has a column called "disease" and another column called "new_variance," with 15 samples as rows. The "disease" column has 7 "Ctrl" and 8 "Treat" labels, and the "new_variance" column has 15 random values (e.g., 1.434, 1.5, 0.989, ...).

I'm having trouble understanding the meaning of the following three cases of

# 1. disease
mm1 = model.matrix(disease, coldata)
ddsMat <- DESeqDataSetFromMatrix(counts, coldata, design = ~ 1)
ddsMat2 = DESeq(ddsMat, full = mm1, betaPrior = FALSE)

# 2. new_variance
mm2 = model.matrix(~new_variance, coldata)
ddsMat3 <- DESeqDataSetFromMatrix(counts, coldata, design = ~ 1)
ddsMat4 = DESeq(ddsMat3, full = mm2, betaPrior = FALSE)

# 3. interaction
mm3 = model.matrix(~new_variance*disease, coldata)
ddsMat5 <- DESeqDataSetFromMatrix(counts, coldata, design = ~ 1)
ddsMat6 = DESeq(ddsMat5, full = mm3, betaPrior = FALSE)


In the above examples, I can easily understand that 1. disease is a typical case study in RNA-seq analysis, where we can predict the Fold Change as treat/control because the "disease" column in coldata clearly distinguishes "Ctrl" and "Treat."

However, for 2. new_variance, can we distinguish "Ctrl" and "Treat"? Furthermore, I'm not clear on how to interpret the interaction in 3. What I'm expecting is that the meaning of RNA-seq analysis might change a bit. I thought that if I use the "new_variance" column, it's not a typical case study. So, when applying a new metric to RNA-seq analysis, what does it mean, and how should I interpret the statistical analysis results?

I've looked at "Analyzing RNA-seq data with DESeq2," but I didn't quite understand it. If you could help me understand what I'm missing, I would be really grateful. In the cases of 2 and 3, what is the meaning of the Fold Change that appears in the statistical analysis results? If there's a correct answer, and if I've missed something, providing a reference link would be great. Thank you.

DESeq2 • 175 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Why are you modeling random variables? If what you say about new_variance is true, then you are modeling gene expression as a function of random data, which makes no sense.

If you are confused by regression models, this isn't the place to ask. You could try biostars.org, or better yet read about it for yourself.

0
Entering edit mode

Thanks for reply. Honestly, it's not entirely a random variable, and there is some distinction between control and treatment, but it's not a perfect separation like 0 and 1. For example, let's assume that the average of 7 control samples of new_variance is 0.5, and the standard deviation is 0.2. In that case, you can think about a situation where the average of 8 treatment samples is 2.5, and the standard deviation is around 0.6. So, in conclusion, I'd like to have a rough idea of what calculations are performed internally when I run mm2 or mm3. If there is "almost" a separation between control and treat in the composition of the new_variance data, does this have any significance?

0
Entering edit mode

It's not entirely a random variable? There's no gray area here. You said it had 15 random values.

Anyway, as I already mentioned, this site isn't meant to be a place for people to get a primer on linear regression, but instead it's meant to help people with technical questions about the software. I already gave you a reference link to Julian Faraway's linear regression book. You might also Google things like 'ANOVA vs regression', and 'interaction term linear regression' if Faraway's book is TL,DR for you.