Question

DESeq2 - Different results between groups depending on which is first

0

Entering edit mode

Elin Videvall ▴ 20

@elin-videvall-5958

Last seen 9.6 years ago

Dear Bioconductor list. I have 3 time points (conditions) and 4 replicates. This is what my colDataTable looks like: condition ind B10 1_Uninfected Ind_1 B11 1_Uninfected Ind_1 B12 1_Uninfected Ind_1 B20 1_Uninfected Ind_2 B30 1_Uninfected Ind_3 B40 1_Uninfected Ind_4 B21 2_Peak_infection Ind_2 B31 2_Peak_infection Ind_3 B41 2_Peak_infection Ind_4 B22 3_After_infection Ind_2 B32 3_After_infection Ind_3 B42 3_After_infection Ind_4 (I had to add numbers in front of the treatments because otherwise deseq would only test the first two groups against After_infection, but I was also interested in the results from Peak vs Uninfected.) dds <- DESeqDataSetFromMatrix(countData = counts, colData = colDataTable, design = ~ ind+condition) dds <- DESeq(dds) resultsNames(dds) res <- results(dds, "condition_2_Peak_infection_vs_1_Uninfected") This all works perfectly fine. The problem is when I changed the number in the condition-names, so I could change which group to test against, the result output from the test with the exact same groups is different. I.e. if I change the name of the first two conditions to: "2_Uninfected" and "1_Peak_infection", with calling upon the results with: res <- results(dds, "condition_2_Uninfected_vs_1_Peak_infection") then my output is different from the first test between the exact same groups. The baseMean remains exactly the same, but the other columns: log2foldchange, p-value, and FDR is not. Why? I would expect only the log2foldchange to switch symbols, and nothing else. Can anyone explain if I'm doing something wrong or why deseq output different results? I'm using version DESeq2_1.0.9. Thank you, Elin. [[alternative HTML version deleted]]

DESeq DESeq • 1.4k views

ADD COMMENT • link updated 10.9 years ago by Michael Love 41k • written 10.9 years ago by Elin Videvall ▴ 20

score 0 · Answer 1 · 2013-05-27

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 4 hours ago

United States

hi Elin, On Mon, May 27, 2013 at 8:50 PM, Elin Videvall <elin.videvall@biol.lu.se>wrote: > > (I had to add numbers in front of the treatments because otherwise deseq > would only test the first two groups against After_infection, but I was > also interested in the results from Peak vs Uninfected.) > As in the vignette, you can instead use the levels argument of the factor() function to determine the order of the factor levels: condition <- factor(condition, levels=c("Uninfected","Peak_infection","After_infection"))â And later to change the base level, you can either refactor with a new order of levels, or you can use the relevel function to change the reference level: condition <- factor(condition, levels=c("Peak_infection","Uninfected","After_infection"))â - or - condition <- relevel(condition, "Peak_infection") > then my output is different from the first test between the exact same > groups. The baseMean remains exactly the same, but the other columns: > log2foldchange, p-value, and FDR is not. Why? I would expect only the > log2foldchange to switch symbols, and nothing else. The output is different because we have added priors on the non- intercept coefficients resulting in shrinkage toward log2 fold changes of 0 for genes with low counts and/or high dispersion (as in Figure 1 in the vignette). So the reported value is not simply log2 of the ratio of means. If you want to avoid this shrinkage of coefficients (though we have seen it to make the log2 fold changes more reproducible), you can set the 'betaPrior' argument to DESeq() to FALSE. Mike [[alternative HTML version deleted]]

ADD COMMENT • link 10.9 years ago Michael Love 41k

0

Entering edit mode

Thank you so much, Mike. Very helpful! I have one last question. If you have one uninfected individual sampled during three time points, would you make use of all three transcriptomes in the uninfected group, or would you discard two of them, considering they are from the same individual? What do you think is the best practice? Sincerely, Elin On Mon, May 27, 2013 at 9:49 PM, Michael Love <michaelisaiahlove@gmail.com>wrote: > hi Elin, > > On Mon, May 27, 2013 at 8:50 PM, Elin Videvall <elin.videvall@biol.lu.se>wrote: > >> >> (I had to add numbers in front of the treatments because otherwise deseq >> would only test the first two groups against After_infection, but I was >> also interested in the results from Peak vs Uninfected.) >> > > As in the vignette, you can instead use the levels argument of the > factor() function to determine the order of the factor levels: > > condition <- factor(condition, > levels=c("Uninfected","Peak_infection","After_infection")) > > And later to change the base level, you can either refactor with a new > order of levels, or you can use the relevel function to change the > reference level: > > condition <- factor(condition, > levels=c("Peak_infection","Uninfected","After_infection")) > - or - > condition <- relevel(condition, "Peak_infection") > > > >> then my output is different from the first test between the exact same >> groups. The baseMean remains exactly the same, but the other columns: >> log2foldchange, p-value, and FDR is not. Why? I would expect only the >> log2foldchange to switch symbols, and nothing else. > > > > The output is different because we have added priors on the non- intercept > coefficients resulting in shrinkage toward log2 fold changes of 0 for genes > with low counts and/or high dispersion (as in Figure 1 in the vignette). So > the reported value is not simply log2 of the ratio of means. If you want to > avoid this shrinkage of coefficients (though we have seen it to make the > log2 fold changes more reproducible), you can set the 'betaPrior' argument > to DESeq() to FALSE. > > Mike > [[alternative HTML version deleted]]

ADD REPLY • link 10.9 years ago Elin Videvall ▴ 20

0

Entering edit mode

hi Elin, On Mon, May 27, 2013 at 10:31 PM, Elin Videvall <elin.videvall@biol.lu.se>wrote: > > > If you have one uninfected individual sampled during three time points, > would you make use of all three transcriptomes in the uninfected group, or > would you discard two of them, considering they are from the same > individual? What do you think is the best practice? > This depends if these 3 samples are close to each other (helping to lower dispersion estimates) or far apart (which would drive up dispersion estimates). It might be useful to look at heatmaps and PCA plots of the data as in sections 8.2 and 8.3 of the vignette. Mike [[alternative HTML version deleted]]

ADD REPLY • link 10.9 years ago Michael Love 41k