DESeq2 for multiple groups
1
1
Entering edit mode
ijvechetti ▴ 10
@ijvechetti-20701
Last seen 16 months ago
United States

Hello,

I'm having difficulties to set up the best design for my experiment. I have 9 different groups (different metals) and a time-course (1,3,6 and 9 months). One of those metals will be my control, and I'm trying to see what miRNAs are changing among metals in each time-point and also if there is a difference among the time point. All samples are independent.

I have so far:

dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData,design= ~ Condition + Time)
dds<-DESeq(dds, test = "LRT", full = ~ Condition + Time, reduced = ~Time)

head(colData)
Names   Condition   Time
Ta-3-1  Ta3 Day3
Ta-3-2  Ta3 Day3
Ta-3-3  Ta3 Day3
W-3-1   W3  Day3
W-3-2   W3  Day3
W-3-3   W3  Day3
Ni-3-1  Ni3 Day3
Ni-3-2  Ni3 Day3
Ni-3-3  Ni3 Day3
Co-3-1  Co3 Day3
Co-3-2  Co3 Day3
Co-3-3  Co3 Day3
Fe-3-1  Fe3 Day3
Fe-3-2  Fe3 Day3
Fe-3-3  Fe3 Day3
Cu-3-1  Cu3 Day3
.
.
.
Ta-1-1  Ta1 Day1
Ta-1-2  Ta1 Day1
Ta-1-3  Ta1 Day1
Ta-1-4  Ta1 Day1
Ta-1-5  Ta1 Day1

dds<-DESeq(dds, test = "LRT", full = ~ Condition + Time, reduced = ~Time)
resultsNames(dds)
 [1] "Intercept"          "Condition_Al_vs_Ta" "Condition_Co_vs_Ta" "Condition_Cu_vs_Ta" "Condition_DU_vs_Ta"
 [6] "Condition_Fe_vs_Ta" "Condition_Ni_vs_Ta" "Condition_Pb_vs_Ta" "Condition_W_vs_Ta"  "Time_Day12_vs_Day1"
[11] "Time_Day3_vs_Day1"  "Time_Day6_vs_Day1" 

I understand that the LRT is doing the test in all the conditions. However, I want to be able to compare the metals (e.g. Al vs Ta). But I want to do that for 1, 3, 6 and 12 months. Since is not on my Names() I don't know how to proceed.

Thanks in advance

deseq2 • 7.5k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Take a look at the workflow, which has a similar time series example that will probably help you answer your question.

https://bioconductor.org/packages/release/workflows/html/rnaseqGene.html

ADD COMMENT
0
Entering edit mode

Michael, thanks for your prompt answer. After taking a look at the workflows, I saw that my data goes in "Model matrix not full rank" (missing some groups). That way I though in fitting all the groups, including different time, together and do a contrast for all the comparisons I want to make. Is that the appropriate way to do it or should I fit only one specific time point? Sorry if this basic, I just want to make sure I understood the workflow.

ADD REPLY
0
Entering edit mode

Can you say more about which groups of samples are missing? This will affect what kinds of analyses you can perform.

ADD REPLY
0
Entering edit mode

So I have 9 metals (one is the control) and 4-time points (1,3, 6 and 12 months). Some samples I have 3 replicates and some I have 8. In the 12 months, I have on group missing, because the animals died. Is this helpful?

ADD REPLY
1
Entering edit mode

So then you need to form the model matrices using model.matrix and remove the columns that have 0’s. See the vignette section on the full rank error. Then provide the matrices to full and reduced arguments of DESeq().

ADD REPLY
0
Entering edit mode

Michael, sorry to bother you again, but I'm stuck now in how to add the model.matrix in dds and run the LRT. Following the vignette I did:

countData= read.table("file.txt", header=T,row.names=1) colData= read.table ("file.txt", header=T) m1<-model.matrix(~ Condition + Time + Condition:Time, colData) all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero idx <- which(all.zero) m1 <- m1[,-idx] unname(m1)

Now I don't know to get this new model matrix in my DESeq to run LRT. Could you tell me how to do that?

Additionally, to run the LRT, how I would relevel my control since I have different time points? All my controls from different time points would have to have the same name to run?

Sorry to bother you, but I really appreciate your time, the software and the vignette that is super helpful.

ADD REPLY
1
Entering edit mode

You provide the matrices to full and reduced as stated in the vignette in the section giving that code you are using.

It looks like:

dds <- DESeq(dds, full=mm, reduced=mm0, test="LRT")

You can put whatever design matrices make sense in mm or mm0 above. You may want to discuss with a statistician about what these LRT are representing.

ADD REPLY
0
Entering edit mode

Thank you so much. All your answer have helped me to understand better all the commands. I still have some doubts and I was wondering if you could point out what could that be:

countData= read.table("miRNAcountsno-pre.ivan.txt", header=T,row.names=1) colData= read.table ("FactorsNo-pre.txt", header=T) library(DESeq2) dds<-DESeqDataSetFromMatrix(countData= countData,colData= colData,design= ~ Condition)

model.matrix

m1<-model.matrix(~ Condition + Time + Condition:Time, colData) colnames(m1) [1] "(Intercept)" "ConditionAl12" "ConditionAl3"
[4] "ConditionAl6" "ConditionCo1" "ConditionCo12"
[7] "ConditionCo3" "ConditionCo6" "ConditionCu1"
[10] "ConditionCu12" "ConditionCu3" "ConditionCu6"
[13] "ConditionDU1" "ConditionDU12" "ConditionDU3"
[16] "ConditionDU6" "ConditionFe1" "ConditionFe12"
[19] "ConditionFe3" "ConditionFe6" "ConditionNi1"
[22] "ConditionNi3" "ConditionNi6" "ConditionPb1"
[25] "ConditionPb12" "ConditionPb3" "ConditionPb6"
[28] "ConditionTa1" "ConditionTa12" "ConditionTa3"
[31] "ConditionTa6" "ConditionW1" "ConditionW12"
[34] "ConditionW3" "ConditionW6" "TimeDay12"
[37] "TimeDay3" "TimeDay6" "ConditionAl12:TimeDay12" [40] "ConditionCo12:TimeDay12" "ConditionCu12:TimeDay12" "ConditionDU12:TimeDay12" [43] "ConditionFe12:TimeDay12" "ConditionPb12:TimeDay12" "ConditionTa12:TimeDay12" [46] "ConditionW12:TimeDay12" "ConditionAl3:TimeDay3" "ConditionCo3:TimeDay3"
[49] "ConditionCu3:TimeDay3" "ConditionDU3:TimeDay3" "ConditionFe3:TimeDay3"
[52] "ConditionNi3:TimeDay3" "ConditionPb3:TimeDay3" "ConditionTa3:TimeDay3"
[55] "ConditionW3:TimeDay3" "ConditionAl6:TimeDay6" "ConditionCo6:TimeDay6"
[58] "ConditionCu6:TimeDay6" "ConditionDU6:TimeDay6" "ConditionFe6:TimeDay6"
[61] "ConditionNi6:TimeDay6" "ConditionPb6:TimeDay6" "ConditionTa6:TimeDay6"
[64] "ConditionW6:TimeDay6"

1- Why I'm not seeing Day1?

all.zero <- apply(m1, 2, function(x) all(x==0)) all.zero idx <- which(all.zero) m1 <- m1[,-idx] unname(m1)

m0<-model.matrix(~ Condition:Time, colData) colnames(m0) all.zero <- apply(m0, 2, function(x) all(x==0)) all.zero idx <- which(all.zero) m0 <- m0[,-idx] unname(m0)

dds <- DESeq(dds, full=m1, reduced=m0, test="LRT")

2- If I do like that I still have the "not full rank". What I'm missing?

3- After the LRT test, I want to do the contrast for each different metal and also with different time. Should I do all the contrast one by one? Or should I design all the metals for a specific time and do the contrasts that way?

Thanks once again

ADD REPLY
0
Entering edit mode

I'd recommend at this point that you discuss your particular experiment and your analysis plan with a local statistician. They can help explain, e.g. why Day1 is not in the coefficient list, and how to generate a full and reduced model matrix for LRT.

The support site is mostly for software usage questions, and I just don't have any time in my schedule to go beyond software questions.

ADD REPLY
0
Entering edit mode

Appreciate your time and understand that. I just edited my comment with a third question. Can you tell me about that question?

ADD REPLY
1
Entering edit mode

In the workflow we show how to look at individual time points with a Wald test in addition to a LRT of any differences across time. You can try that example.

ADD REPLY
0
Entering edit mode

Thanks Michael. I think following the workflow, as you suggested, helped me to make my design and do my comparisons. But I still have some doubts in what I'm comparing. Could you give me a hand?

I did:

Design= ~Condition + Time + Condition:Time (full model) vs ~Condition + Time (reduced model). This would give me the changes in the genes that are changing over time, right?

My control is a group named Ta and the time would be 1 month.

resultsNames(dds) [1] "Intercept" "ConditionAl" "ConditionCo" "ConditionCu"
[5] "ConditionDU" "ConditionFe" "ConditionNi" "ConditionPb"
[9] "ConditionW" "Time12month" "Time3month" "Time6month"
[13] "ConditionAl.Time12month" "ConditionCo.Time12month" "ConditionCu.Time12month" "ConditionDU.Time12month" [17] "ConditionFe.Time12month" "ConditionPb.Time12month" "ConditionW.Time12month" "ConditionAl.Time3month" [21] "ConditionCo.Time3month" "ConditionCu.Time3month" "ConditionDU.Time3month" "ConditionFe.Time3month" [25] "ConditionNi.Time3month" "ConditionPb.Time3month" "ConditionW.Time3month" "ConditionAl.Time6month" [29] "ConditionCo.Time6month" "ConditionCu.Time6month" "ConditionDU.Time6month" "ConditionFe.Time6month" [33] "ConditionNi.Time6month" "ConditionPb.Time6month" "ConditionW.Time6month"

For example, when I do ContiditonAl, test=wald, what I’m comparing? The group AL vs my control in month 1? Additionally, if I do ConditionAl.Time12month, it would be Al vs Ta at 12 months?

Thanks in advance

ADD REPLY
0
Entering edit mode

Questions about what the coefficients mean in a statistical model are unfortunately beyond what I have free time to provide on the support site. We have a bit of material in the vignette, but beyond that you should really discuss with a statistical or bioinformatic collaborator.

ADD REPLY

Login before adding your answer.

Traffic: 816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6