DESeq2: Continuous and discreate variables in the design
1
0
Entering edit mode
@raquelgarza95-22451
Last seen 18 months ago

Hello,

I'm trying to get my head around some DESeq2 results and I would really appreciate some help.

I have two conditions (control and disease) and one continuous variable (age). Each of the conditions would be conformed by 100 individuals with different ages. I want to see which changes occur in the control group as age increases, and the same thing with the disease group (I know this would be optimally with the same individuals in different time points but I'm working with post-mortem tissue so this is the best i can do).

I have set my DESeq2 design as ~condition+age+condition:age

From resultsNames(dds) I get:

[1]  "Intercept" "condition_disease_vs_ctrl" "age" "conditiondisease.age"


I suppose that "conditiondisease.age" is the name I am interested for the disease group. But is "age" the one for the control? (since it is the reference level) Or is it age regardless of the condition? (if it is this option, how can i get "conditionctrl.age"?)

I also have a question on how to interpret the log2FC on this, according to the vignette this would be the change per unit of the continuous variable (age). If age is integers, is this set to have the lowest value (youngest in my setup) as a reference point? Or the highest (oldest)?

And one more, do I have to sort age before giving it to DESeq2? I am guessing no but it doesn't hurt to ask.

Thank you!!

deseq2 R rna-seq design • 1.2k views
0
Entering edit mode
@mikelove
Last seen 23 minutes ago
United States

I would recommend ~condition + condition:age which is a bit easier to interpret. You will get an age term for control and an age term for disease, which you can pull out with results(dds, name="..."). And you can contrast the two with results(dds, contrast=list("...", "...")).

The interpretation of the LFC is the log of the fold change in expression for one unit of the variable. There is no specific reference point, it is folded into the intercept no matter what you set to be 0 (whether the youngest, or the oldest, or the sample average).

0
Entering edit mode

Thank you Michael! I got exactly what I needed from the first part. However, I'm struggling to understand the LFC explanation. What did you mean by the intercept? I didn't map the age to values starting from 0 (maybe I should do this?). I input the ages as they were (30-90). Or maybe I didn't understand what you meant by the intercept. Sorry, it's hard for me to grasp this no reference fold change.

0
Entering edit mode

I may suggest you discuss this with a statistician to have a longer answer regarding the question about a reference point for the continuous variable. The practical answer is that there is no reference point, but it would be good for you to discuss with someone to understand why that is the case and how continuous variables work in linear models.

0
Entering edit mode

Hi again! Thank you, I talked about it with someone and I think I now understand what you meant. You meant that no matter how you set up the continuous variable (what it is set to be 0), the LFC in a linear model is going to be for each step of the continuous variable. This is crystal clear now :-).

But my question was more about how to set the 0 from the continuous variable, or how DESeq2 decides how to order this variable (that's what I was trying to say with the reference point but it was the wrong term, I'm sorry), if the 0 is set to be the minimum value or something else? Maybe it depends on how I sort it (for example ordering colData in age decreasing order)?

I guess it is the minimum value but I don't want to risk having a wrong interpretation since this matter a lot with factors.

Sorry for the confusion, and thank you for all the help!

1
Entering edit mode

There is no reference point for the continuous variable.

0
Entering edit mode

Hi Michael, A somewhat related question is why if you add the interaction term by itself, e.g. ~ Diagnosis + Sex + Diagnosis:Sex I only get term for M vs. F. and sex term for disease, but if I remove the Sex term as in ~ Diagnosis + Diagnosis:Sex then I get sex term for both control and disease? If you can point me to where I can find my own answer that would be nice. Brian

0
Entering edit mode

Ok nevermind. When I pull results(dds, name = "DiagnosisPathologic.SexM") for ~ Diagnosis + Sex + Diagnosis:Sex it's the same as results(dds, contrast = list("DiagnosisPathologic.SexM", "DiagnosisControl.SexM")) for ~ Diagnosis + Diagnosis:Sex.

I actually tried the reverse, i.e. ~ Sex + Diagnosis + Sex:Diagnosis and get same results as above with results(dds, name = "SexM.DiagnosisPathologic"). Does the order not matter? I also get same for each with results(dds, name="Diagnosis_Pathologic_vs_Control").

Also is it possible to control for two variables at same time?

0
Entering edit mode

See other thread re: order of variables in design.

The statistical design is really up to you, any design matrix can be used as long as the columns are linearly independent.