Question

Follow-up: Multiple comparisons using contrast

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hi Mike, I am writing to follow up the multiple comparison using contrasts. To remind you again, here???s my experimental design: Genotypes: 4 different genotypes Timepoint: 3 different timepoints (6h, 12h, and 24h) Temperature: Low and high temperatures 3 biological replicates for each condition. In the previous post, you suggested the following design: ~ genotype + time + temp + genotype:temp + time:temp And then call resultsNames(dds) In order to see all the interactions which are available for generating tests. For example: results(dds, name="genotypeA:tempHi") ...will provide you with the results of a test of whether the high temperature vs the low temperature has a specific effect for genotype A, over all time points. My questions are: 1. From your suggestions, to make sure we are on the same page, did you recommend me using all gene counts from all genotypes at all timepoints and temperatures for differential expression step (dds) before calling results (dds) for comparisons of interest? Or, should I just pull out gene counts of genotypes/timepoints/temperature I am interested in for the dds step before calling results (dds) for comparisons of interest? What I have done is i) using gene counts from all samples for differential expression step (dds) before calling results(dds); see output#3 ii) using gene counts from a subset of my samples for differential expression step (dds) before calling results(dds);see output#2 When I tried performing differential expression using gene counts from all samples (at all timepoints and temperatures), I received these warning messages from R (please see output #1 in the box below). On a different note, when I tried using only the gene counts of a subset of samples I wanted to compare, DESeq2(version 1.2.6) automatically determined the types of comparisons I could make in the resultsNames(dds). For example, >resultsNames(dds2) "Intercept" "temp_temp1_vs_temp2" "time_24h_vs_12h" "time_6h_vs_12h" "temp2.time24h" "temp2.time6h" How are all these comparisons pre-determined? When I called results(dds), does it compare effect of tempHi versus tempLow on genotype A, over all time points? Please see the output #3 for reference. Finally, the results of differentially expressed genes for i) and ii) are different. So, I???d like to make sure which step I should be doing and if there is anything wrong with my R-command lines. Many thanks, Yoong -- output of sessionInfo(): Output#1 > dds1= DESeq(dds,betaPrior=FALSE) estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing There were 12 warnings (use warnings() to see them) > warnings() Warning messages: 1: glm.fit: algorithm did not converge 2: glm.fit: algorithm did not converge 3: glm.fit: algorithm did not converge 4: glm.fit: algorithm did not converge 5: glm.fit: algorithm did not converge 6: glm.fit: algorithm did not converge 7: glm.fit: algorithm did not converge 8: glm.fit: algorithm did not converge 9: glm.fit: algorithm did not converge 10: glm.fit: algorithm did not converge 11: glm.fit: algorithm did not converge 12: In parametricDispersionFit(mcols(objectNZ)$baseMean[useForFit], ... : dispersion fit did not converge Output#2: >dds1 = DESeqDataSetFromMatrix(countData = GenotypeA, colData = colData, design = ~temp+time+time:temp) >dds2= DESeq(dds1,betaPrior=FALSE) >resultsNames(dds2) "Intercept" "temp_TempHi_vs_TempLow" "time_24h_vs_12h" "time_6h_vs_12h" "TempLow.time24h" "TempHi.time6h" >results(dds2,name="temp_TempHi_vs_TempLow") Output#3: >dds3 = DESeqDataSetFromMatrix(countData = allData, colData = colData, design = ~genotype+time+temp+genotype:temp+ time:temp) >dds4= DESeq(dds3,betaPrior=FALSE) WARNINGS > resultsNames(dds4) [1] "Intercept" "genotypeC_vs_ GenotypeB " "genotypeA_vs_ GenotypeB" [4] "genotype_GenotypeD_vs_GenotypeB" "time_24h_vs_12h" "time_6h_vs_12h" [7] "temp_TempHi_vs_TempLow" "genotypeC.TempHi" "genotypeA.TempHi" [10] "genotypeD.TempHi" "time24h.TempHi" "time6h.TempHi" >results(dds4,name="temp_TempHi_vs_TempLow") -- Sent via the guest posting facility at bioconductor.org.

• 1.3k views

ADD COMMENT • link updated 10.3 years ago by Michael Love 41k • written 10.3 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2014-01-23

Hi Yoong, Answers inline below, On Jan 23, 2014 7:07 PM, "Yoong [guest]" <guest@bioconductor.org> wrote: > > > Hi Mike, > I am writing to follow up the multiple comparison using contrasts. > To remind you again, hereâs my experimental design: > Genotypes: 4 different genotypes > Timepoint: 3 different timepoints (6h, 12h, and 24h) > Temperature: Low and high temperatures > 3 biological replicates for each condition. > > In the previous post, you suggested the following design: > ~ genotype + time + temp + genotype:temp + time:temp > > And then call > > resultsNames(dds) > > In order to see all the interactions which are available for generating tests. For example: > > results(dds, name="genotypeA:tempHi") > > ...will provide you with the results of a test of whether the high temperature vs the low temperature has a specific effect for genotype A, over all time points. > > My questions are: > > 1. From your suggestions, to make sure we are on the same page, did you recommend me using all gene counts from all genotypes at all timepoints and temperatures for differential expression step (dds) before calling results (dds) for comparisons of interest? Yes, I was recommending you run with all samples in the dataset object. > Or, should I just pull out gene counts of genotypes/timepoints/temperature I am interested in for the dds step before calling results (dds) for comparisons of interest? > > What I have done is > i) using gene counts from all samples for differential expression step (dds) before calling results(dds); see output#3 > ii) using gene counts from a subset of my samples for differential expression step (dds) before calling results(dds);see output#2 > > When I tried performing differential expression using gene counts from all samples (at all timepoints and temperatures), I received these warning messages from R (please see output #1 in the box below). > This warning means that the parametric trend for dispersion is not appropriate for your data. I would run DESeq() with the argument, fitType="mean". > On a different note, when I tried using only the gene counts of a subset of samples I wanted to compare, DESeq2(version 1.2.6) automatically determined the types of comparisons I could make in the resultsNames(dds). For example, > > >resultsNames(dds2) > "Intercept" "temp_temp1_vs_temp2" "time_24h_vs_12h" "time_6h_vs_12h" "temp2.time24h" "temp2.time6h" > > How are all these comparisons pre-determined? These are determined by the R function model.matrix() using the levels of the factors in the colData of the subsetted dataset object. Temp 1 and Time 12h are the base levels.of these factors. You should specify which levels you want as the base levels before running DESeq(). > When I called results(dds), does it compare effect of tempHi versus tempLow on genotype A, over all time points? Please see the output #3 for reference. No over " GenotypeB" (note the space) because this is alphabetically before genotypeA. We make it very clear in the vignette about the importance of setting the base level of factors. If you set "A" as base level, it would be as you said. > > Finally, the results of differentially expressed genes for i) and ii) are different. So, Iâd like to make sure which step I should be doing and if there is anything wrong with my R-command lines. > > Many thanks, > Yoong > > > -- output of sessionInfo(): > > Output#1 > > dds1= DESeq(dds,betaPrior=FALSE) > estimating size factors > estimating dispersions > gene-wise dispersion estimates > mean-dispersion relationship > final dispersion estimates > fitting model and testing > There were 12 warnings (use warnings() to see them) > > warnings() > Warning messages: > 1: glm.fit: algorithm did not converge > 2: glm.fit: algorithm did not converge > 3: glm.fit: algorithm did not converge > 4: glm.fit: algorithm did not converge > 5: glm.fit: algorithm did not converge > 6: glm.fit: algorithm did not converge > 7: glm.fit: algorithm did not converge > 8: glm.fit: algorithm did not converge > 9: glm.fit: algorithm did not converge > 10: glm.fit: algorithm did not converge > 11: glm.fit: algorithm did not converge > 12: In parametricDispersionFit(mcols(objectNZ)$baseMean[useForFit], ... : > dispersion fit did not converge > > Output#2: > >dds1 = DESeqDataSetFromMatrix(countData = GenotypeA, colData = colData, design = ~temp+time+time:temp) > >dds2= DESeq(dds1,betaPrior=FALSE) > >resultsNames(dds2) > > "Intercept" "temp_TempHi_vs_TempLow" "time_24h_vs_12h" "time_6h_vs_12h" "TempLow.time24h" "TempHi.time6h" > > >results(dds2,name="temp_TempHi_vs_TempLow") > > Output#3: > > >dds3 = DESeqDataSetFromMatrix(countData = allData, colData = colData, design = ~genotype+time+temp+genotype:temp+ time:temp) > >dds4= DESeq(dds3,betaPrior=FALSE) > WARNINGS > > > > resultsNames(dds4) > [1] "Intercept" "genotypeC_vs_ GenotypeB " "genotypeA_vs_ GenotypeB" > [4] "genotype_GenotypeD_vs_GenotypeB" "time_24h_vs_12h" "time_6h_vs_12h" > [7] "temp_TempHi_vs_TempLow" "genotypeC.TempHi" "genotypeA.TempHi" > [10] "genotypeD.TempHi" "time24h.TempHi" "time6h.TempHi" > > >results(dds4,name="temp_TempHi_vs_TempLow") > > > > > -- > Sent via the guest posting facility at bioconductor.org. [[alternative HTML version deleted]]