Question: DESeq2 for multiple groups
0
gravatar for ijvechetti
4 months ago by
ijvechetti0
ijvechetti0 wrote:

Hello,

I'm having difficulties to set up the best design for my experiment. I have 9 different groups (different metals) and a time-course (1,3,6 and 9 months). One of those metals will be my control, and I'm trying to see what miRNAs are changing among metals in each time-point and also if there is a difference among the time point. All samples are independent.

I have so far:

dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData,design= ~ Condition + Time)
dds<-DESeq(dds, test = "LRT", full = ~ Condition + Time, reduced = ~Time)

head(colData)
Names   Condition   Time
Ta-3-1  Ta3 Day3
Ta-3-2  Ta3 Day3
Ta-3-3  Ta3 Day3
W-3-1   W3  Day3
W-3-2   W3  Day3
W-3-3   W3  Day3
Ni-3-1  Ni3 Day3
Ni-3-2  Ni3 Day3
Ni-3-3  Ni3 Day3
Co-3-1  Co3 Day3
Co-3-2  Co3 Day3
Co-3-3  Co3 Day3
Fe-3-1  Fe3 Day3
Fe-3-2  Fe3 Day3
Fe-3-3  Fe3 Day3
Cu-3-1  Cu3 Day3
.
.
.
Ta-1-1  Ta1 Day1
Ta-1-2  Ta1 Day1
Ta-1-3  Ta1 Day1
Ta-1-4  Ta1 Day1
Ta-1-5  Ta1 Day1

dds<-DESeq(dds, test = "LRT", full = ~ Condition + Time, reduced = ~Time)
resultsNames(dds)
 [1] "Intercept"          "Condition_Al_vs_Ta" "Condition_Co_vs_Ta" "Condition_Cu_vs_Ta" "Condition_DU_vs_Ta"
 [6] "Condition_Fe_vs_Ta" "Condition_Ni_vs_Ta" "Condition_Pb_vs_Ta" "Condition_W_vs_Ta"  "Time_Day12_vs_Day1"
[11] "Time_Day3_vs_Day1"  "Time_Day6_vs_Day1" 

I understand that the LRT is doing the test in all the conditions. However, I want to be able to compare the metals (e.g. Al vs Ta). But I want to do that for 1, 3, 6 and 12 months. Since is not on my Names() I don't know how to proceed.

Thanks in advance

deseq2 • 384 views
ADD COMMENTlink modified 4 months ago by Steve Lianoglou12k • written 4 months ago by ijvechetti0
Answer: DESeq2 for multiple groups
1
gravatar for Michael Love
4 months ago by
Michael Love26k
United States
Michael Love26k wrote:

Take a look at the workflow, which has a similar time series example that will probably help you answer your question.

https://bioconductor.org/packages/release/workflows/html/rnaseqGene.html

ADD COMMENTlink written 4 months ago by Michael Love26k

Michael, thanks for your prompt answer. After taking a look at the workflows, I saw that my data goes in "Model matrix not full rank" (missing some groups). That way I though in fitting all the groups, including different time, together and do a contrast for all the comparisons I want to make. Is that the appropriate way to do it or should I fit only one specific time point? Sorry if this basic, I just want to make sure I understood the workflow.

ADD REPLYlink written 4 months ago by ijvechetti0

Can you say more about which groups of samples are missing? This will affect what kinds of analyses you can perform.

ADD REPLYlink written 4 months ago by Michael Love26k

So I have 9 metals (one is the control) and 4-time points (1,3, 6 and 12 months). Some samples I have 3 replicates and some I have 8. In the 12 months, I have on group missing, because the animals died. Is this helpful?

ADD REPLYlink written 4 months ago by ijvechetti0
1

So then you need to form the model matrices using model.matrix and remove the columns that have 0’s. See the vignette section on the full rank error. Then provide the matrices to full and reduced arguments of DESeq().

ADD REPLYlink written 4 months ago by Michael Love26k

Michael, sorry to bother you again, but I'm stuck now in how to add the model.matrix in dds and run the LRT. Following the vignette I did:

countData= read.table("file.txt", header=T,row.names=1) colData= read.table ("file.txt", header=T) m1<-model.matrix(~ Condition + Time + Condition:Time, colData) all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero idx <- which(all.zero) m1 <- m1[,-idx] unname(m1)

Now I don't know to get this new model matrix in my DESeq to run LRT. Could you tell me how to do that?

Additionally, to run the LRT, how I would relevel my control since I have different time points? All my controls from different time points would have to have the same name to run?

Sorry to bother you, but I really appreciate your time, the software and the vignette that is super helpful.

ADD REPLYlink written 4 months ago by ijvechetti0
1

You provide the matrices to full and reduced as stated in the vignette in the section giving that code you are using.

It looks like:

dds <- DESeq(dds, full=mm, reduced=mm0, test="LRT")

You can put whatever design matrices make sense in mm or mm0 above. You may want to discuss with a statistician about what these LRT are representing.

ADD REPLYlink written 4 months ago by Michael Love26k

Thank you so much. All your answer have helped me to understand better all the commands. I still have some doubts and I was wondering if you could point out what could that be:

countData= read.table("miRNAcountsno-pre.ivan.txt", header=T,row.names=1) colData= read.table ("FactorsNo-pre.txt", header=T) library(DESeq2) dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData,design= ~ Condition)

model.matrix

m1<-model.matrix(~ Condition + Time + Condition:Time, colData) colnames(m1) [1] "(Intercept)" "ConditionAl12" "ConditionAl3"
[4] "ConditionAl6" "ConditionCo1" "ConditionCo12"
[7] "ConditionCo3" "ConditionCo6" "ConditionCu1"
[10] "ConditionCu12" "ConditionCu3" "ConditionCu6"
[13] "ConditionDU1" "ConditionDU12" "ConditionDU3"
[16] "ConditionDU6" "ConditionFe1" "ConditionFe12"
[19] "ConditionFe3" "ConditionFe6" "ConditionNi1"
[22] "ConditionNi3" "ConditionNi6" "ConditionPb1"
[25] "ConditionPb12" "ConditionPb3" "ConditionPb6"
[28] "ConditionTa1" "ConditionTa12" "ConditionTa3"
[31] "ConditionTa6" "ConditionW1" "ConditionW12"
[34] "ConditionW3" "ConditionW6" "TimeDay12"
[37] "TimeDay3" "TimeDay6" "ConditionAl12:TimeDay12" [40] "ConditionCo12:TimeDay12" "ConditionCu12:TimeDay12" "ConditionDU12:TimeDay12" [43] "ConditionFe12:TimeDay12" "ConditionPb12:TimeDay12" "ConditionTa12:TimeDay12" [46] "ConditionW12:TimeDay12" "ConditionAl3:TimeDay3" "ConditionCo3:TimeDay3"
[49] "ConditionCu3:TimeDay3" "ConditionDU3:TimeDay3" "ConditionFe3:TimeDay3"
[52] "ConditionNi3:TimeDay3" "ConditionPb3:TimeDay3" "ConditionTa3:TimeDay3"
[55] "ConditionW3:TimeDay3" "ConditionAl6:TimeDay6" "ConditionCo6:TimeDay6"
[58] "ConditionCu6:TimeDay6" "ConditionDU6:TimeDay6" "ConditionFe6:TimeDay6"
[61] "ConditionNi6:TimeDay6" "ConditionPb6:TimeDay6" "ConditionTa6:TimeDay6"
[64] "ConditionW6:TimeDay6"

1- Why I'm not seeing Day1?

all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero idx <- which(all.zero) m1 <- m1[,-idx] unname(m1)

m0<-model.matrix(~ Condition:Time, colData) colnames(m0) all.zero <- apply(m0, 2, function(x) all(x==0)) all.zero idx <- which(all.zero) m0 <- m0[,-idx] unname(m0)

dds <- DESeq(dds, full=m1, reduced=m0, test="LRT")

2- If I do like that I still have the "not full rank". What I'm missing?

3- After the LRT test, I want to do the contrast for each different metal and also with different time. Should I do all the contrast one by one? Or should I design all the metals for a specific time and do the contrasts that way?

Thanks once again

ADD REPLYlink modified 4 months ago • written 4 months ago by ijvechetti0

I'd recommend at this point that you discuss your particular experiment and your analysis plan with a local statistician. They can help explain, e.g. why Day1 is not in the coefficient list, and how to generate a full and reduced model matrix for LRT.

The support site is mostly for software usage questions, and I just don't have any time in my schedule to go beyond software questions.

ADD REPLYlink written 4 months ago by Michael Love26k

Appreciate your time and understand that. I just edited my comment with a third question. Can you tell me about that question?

ADD REPLYlink written 4 months ago by ijvechetti0
1

In the workflow we show how to look at individual time points with a Wald test in addition to a LRT of any differences across time. You can try that example.

ADD REPLYlink written 4 months ago by Michael Love26k

Thanks Michael. I think following the workflow, as you suggested, helped me to make my design and do my comparisons. But I still have some doubts in what I'm comparing. Could you give me a hand?

I did:

Design= ~Condition + Time + Condition:Time (full model) vs ~Condition + Time (reduced model). This would give me the changes in the genes that are changing over time, right?

My control is a group named Ta and the time would be 1 month.

resultsNames(dds) [1] "Intercept" "ConditionAl" "ConditionCo" "ConditionCu"
[5] "ConditionDU" "ConditionFe" "ConditionNi" "ConditionPb"
[9] "ConditionW" "Time12month" "Time3month" "Time6month"
[13] "ConditionAl.Time12month" "ConditionCo.Time12month" "ConditionCu.Time12month" "ConditionDU.Time12month" [17] "ConditionFe.Time12month" "ConditionPb.Time12month" "ConditionW.Time12month" "ConditionAl.Time3month" [21] "ConditionCo.Time3month" "ConditionCu.Time3month" "ConditionDU.Time3month" "ConditionFe.Time3month" [25] "ConditionNi.Time3month" "ConditionPb.Time3month" "ConditionW.Time3month" "ConditionAl.Time6month" [29] "ConditionCo.Time6month" "ConditionCu.Time6month" "ConditionDU.Time6month" "ConditionFe.Time6month" [33] "ConditionNi.Time6month" "ConditionPb.Time6month" "ConditionW.Time6month"

For example, when I do ContiditonAl, test=wald, what I’m comparing? The group AL vs my control in month 1? Additionally, if I do ConditionAl.Time12month, it would be Al vs Ta at 12 months?

Thanks in advance

ADD REPLYlink written 3 months ago by ijvechetti0

Questions about what the coefficients mean in a statistical model are unfortunately beyond what I have free time to provide on the support site. We have a bit of material in the vignette, but beyond that you should really discuss with a statistical or bioinformatic collaborator.

ADD REPLYlink written 3 months ago by Michael Love26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 151 users visited in the last hour