Defining the model matrix for a multi-factorial design with interactions

Defining the model matrix for a multi-factorial design with interactions - edgeR

0

Entering edit mode

Marco ▴ 20

@marco-6367

Last seen 9.5 years ago

European Union

Hi Ryan and all the list, here I am again. I have another question regarding my factorial design. As Ryan suggested me I estimated the Model 2, i.e. I defined each treatment combination as a group. This allow me to make a big number of comparisons by defining a contrast for every comparison I can be interested in. For example, I can find differentially expressed genes between GENOTYPE F with LOAD High at TIME H and GENOTYPE F with LOAD High at TIME PH; or I can compare GENOTYPE F with LOAD Medium at TIME H and GENOTYPE F with LOAD Medium at TIME PH, and so on. I found a total of 45 contrasts, so I built the Contrasts Matrix: contrasts<-makeContrasts( F.High.HvsPH = F.High.H - F.High.PH, F.Medium.HvsPH = F.Medium.H - F.Medium.PH, F.Low.HvsPH = F.Low.H - F.Low.PH, F.HighvsMedium.H = F.High.H - F.Medium.H, F.HighvsMedium.PH = F.High.PH - F.Medium.PH, F.HighvsLow.H = F.High.H - F.Low.H, F.HighvsLow.PH = F.High.PH - F.Low.PH, F.MediumvsLow.H = F.Medium.H - F.Low.H, F.MediumvsLow.PH = F.Medium.PH - F.Low.PH, G.High.HvsPH = G.High.H - G.High.PH, G.Medium.HvsPH = G.Medium.H - G.Medium.PH, G.Low.HvsPH = G.Low.H - G.Low.PH, G.HighvsMedium.H = G.High.H - G.Medium.H, G.HighvsMedium.PH = G.High.PH - G.Medium.PH, G.HighvsLow.H = G.High.H - G.Low.H, G.HighvsLow.PH = G.High.PH - G.Low.PH, G.MediumvsLow.H = G.Medium.H - G.Low.H, G.MediumvsLow.PH = G.Medium.PH - G.Low.PH, P.High.HvsPH = P.High.H - P.High.PH, P.Medium.HvsPH = P.Medium.H - P.Medium.PH, P.Low.HvsPH = P.Low.H - P.Low.PH, P.HighvsMedium.H = P.High.H - P.Medium.H, P.HighvsMedium.PH = P.High.PH - P.Medium.PH, P.HighvsLow.H = P.High.H - P.Low.H, P.HighvsLow.PH = P.High.PH - P.Low.PH, P.MediumvsLow.H = P.Medium.H - P.Low.H, P.MediumvsLow.PH = P.Medium.PH - P.Low.PH, FvsG.High.H = F.High.H - G.High.H, FvsG.High.PH = F.High.PH - G.High.PH, FvsG.Medium.H = F.Medium.H - G.Medium.H, FvsG.Medium.PH = F.Medium.PH - G.Medium.PH, FvsG.Low.H = F.Low.H - G.Low.H, FvsG.Low.PH = F.Low.PH - G.Low.PH, FvsP.High.H = F.High.H - P.High.H, FvsP.High.PH = F.High.PH - P.High.PH, FvsP.Medium.H = F.Medium.H - P.Medium.H, FvsP.Medium.PH = F.Medium.PH - P.Medium.PH, FvsP.Low.H = F.Low.H - P.Low.H, FvsP.Low.PH = F.Low.PH - P.Low.PH, GvsP.High.H = G.High.H - P.High.H, GvsP.High.PH = G.High.PH - P.High.PH, GvsP.Medium.H = G.Medium.H - P.Medium.H, GvsP.Medium.PH = G.Medium.PH - P.Medium.PH, GvsP.Low.H = G.Low.H - P.Low.H, GvsP.Low.PH = G.Low.PH - P.Low.PH, levels=mydesign) My problem is that the contrasts I made allow me to compare only one factor considering the others fixed, i.e. I can compare two levels of one factor fixing the other two factors at the same level. So I thik in this way I can't test for interactions. I would like to be able to find also for which genes the interaction between two factors is significative. For example, I would like to find which genes change significantly their expression, for GENOTYPE G, in a different way from High and Medium CHARGE between TIMEs H and PH, i.e for which genes of GENOTYPE F the interaction between CHARGE and TIME is significative. Can I do so by defining a contrast like this? (F.High.H - F.High.PH) - (F.Medium.H - F.Medium.PH) And what if I am interested in obtaining a list of DEG for a comparison of two levels of a factor indipendently from the levels of the other two factors? Can I define a contrast like this? FvsG = F.High.H + F.High.PH + F.Medium.H + F.Medium.PH + F.Low.H + F.Low.PH - G.High.H - G.High.PH - G.Medium.H - G.Medium.PH - G.Low.H - G.Low.PH and then do a LRT to find the differentially expressed genes? Or does it make more sense to obtain a list of DEG for every contrast I made before and then obtaining the FvsG list of DEG doing the union of all the lists of differentially expressed genes like this: # Let DE.FvsG.High.H, DE.FvsG.High.PH, DE.FvsG.Medium.H, DE.FvsG.Medium.PH , # DE.FvsG.Low.H, DE.FvsG.Low.PH be the lists of differentially expressed genes # for the related contrasts of the Contrast Matrix. DE.FvsG<-unique(c(DE.FvsG.High.H, DE.FvsG.High.PH, DE.FvsG.Medium.H, DE.FvsG.Medium.PH, DE.FvsG.Low.H, DE.FvsG.Low.PH)) I'm so sorry for my gaps in models with interactions and more than two factors of interst but it's my first time with a so complex design so I'm trying to understand. Thanks in advance for any replies! Marco > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C [5] LC_TIME=Italian_Italy.1252 attached base packages: [1] splines parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] EDASeq_1.8.0 ShortRead_1.20.0 Rsamtools_1.14.2 [4] lattice_0.20-24 Biostrings_2.30.1 GenomicRanges_1.14.4 [7] XVector_0.2.0 IRanges_1.20.6 Biobase_2.22.0 [10] BiocGenerics_0.8.0 edgeR_3.4.2 limma_3.18.10 [13] aroma.light_1.32.0 matrixStats_0.8.14 yeastRNASeq_0.0.10 loaded via a namespace (and not attached): [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6 [4] DBI_0.2-7 DESeq_1.14.0 genefilter_1.44.0 [7] geneplotter_1.40.0 grid_3.0.2 hwriter_1.3 [10] latticeExtra_0.6-26 R.methodsS3_1.6.1 R.oo_1.17.0 [13] RColorBrewer_1.0-5 RSQLite_0.11.4 stats4_3.0.2 [16] survival_2.37-7 XML_3.98-1.1 xtable_1.7-1 [19] zlibbioc_1.8.0 2014-01-30 Marco Pegoraro <marco.pegoraro88@gmail.com>: > Dear Ryan, > > thank you very much for your reply. I agree with you about the easier > interpretation of the parametrization of Model 2. > About the differences between Models 1 and 2, you are right, the > dispersion parameters estimates are nearly identical but the p-values are > very different between the tests. Probably, as you told me, the reason is I > make some mistakes in the definition of the contrast. > Thank you again. Have a good day, > Marco > > > 2014-01-29 Ryan <rct@thompsonclan.org> > > Hi Marco, >> >> Note that your models 2 & 3 are exactly the same model. The expression in >> your formula for model 3: >> >> Genotype:Load:Time >> >> is more or less equivalent to: >> >> factor(paste(design$Genotype, design$Load, design$Time, sep=":")) >> >> except that the factor levels might possibly be in a different order. As >> for your question about doing the same test with Models 1 & 2, I believe >> that both models are equivalent parametrizations of the same design, so you >> should get (nearly) identical dispersion estimates, and performing >> equivalent tests with the two models should give you the same p-values. If >> you are not getting the same p-values, then you have not properly set up >> the contrast for Model 1. This isn't surprising, because a 3-way >> interaction model is really confusing to work with, and I can't even tell >> you what the correct contrast would be. I would recommend that you stick >> with Model 2 (or 3), where each coefficient has a direct biological >> interpretation as the estimated expression level of a group. Designing >> contrasts for this parametrization is much easier, in my opinion. I think >> you would agree, since you got this contrast correct in your example code. >> >> -Ryan >> >> >> On 1/29/14, 7:11 AM, Marco [guest] wrote: >> >>> Hi all! >>> >>> I have a question about the way to define the model matrix for a >>> multi-factorial design in which I want to model all the interactions. In >>> particular, I have a 3x3x2 factorial design (3 factors) and three >>> biological replicates, so I have 3x3x2x3 = 54 sample. >>> The three factors are: >>> - GENOTYPE with the three levels F, G and P >>> - LOAD with levels HIGH, MEDIUM and LOW >>> - TIME with levels H and PH >>> >>> So this is the design matrix: >>> >>> design >>>> >>> Time Load Genotype >>> FHH_1 H High F >>> FHPH_1 PH High F >>> FHH_2 H High F >>> FHPH_2 PH High F >>> FHH_3 H High F >>> FHPH_3 PH High F >>> FMH_1 H Medium F >>> FMPH_1 PH Medium F >>> FMH_2 H Medium F >>> FMPH_2 PH Medium F >>> FMH_3 H Medium F >>> FMPH_3 PH Medium F >>> FLH_1 H Low F >>> FLPH_1 PH Low F >>> FLH_2 H Low F >>> FLPH_2 PH Low F >>> FLH_3 H Low F >>> FLPH_3 PH Low F >>> GHH_1 H High G >>> GHPH_1 PH High G >>> GHH_2 H High G >>> GHPH_2 PH High G >>> GHH_3 H High G >>> GHPH_3 PH High G >>> GMH_1 H Medium G >>> GMPH_1 PH Medium G >>> GMH_2 H Medium G >>> GMPH_2 PH Medium G >>> GMH_3 H Medium G >>> GMPH_3 PH Medium G >>> GLH_1 H Low G >>> GLPH_1 PH Low G >>> GLH_2 H Low G >>> GLPH_2 PH Low G >>> GLH_3 H Low G >>> GLPH_3 PH Low G >>> PHH_1 H High P >>> PHPH_1 PH High P >>> PHH_2 H High P >>> PHPH_2 PH High P >>> PHH_3 H High P >>> PHPH_3 PH High P >>> PMH_1 H Medium P >>> PMPH_1 PH Medium P >>> PMH_2 H Medium P >>> PMPH_2 PH Medium P >>> PMH_3 H Medium P >>> PMPH_3 PH Medium P >>> PLH_1 H Low P >>> PLPH_1 PH Low P >>> PLH_2 H Low P >>> PLPH_2 PH Low P >>> PLH_3 H Low P >>> PLPH_3 PH Low P >>> >>> >>> I am interested in finding all possible interactions between the three >>> factors, so I want to model also the three way interactions >>> GENOTYPE:LOAD:TIME. I have already read the edgeR vignette so I was >>> wondering if the best way to define the model matrix is by defining it with >>> a formula or by defining each treatment combination as a group. >>> I'm trying to understand which are the differences between the two >>> alternatives, so I tried to estimate both the models: >>> >>> Model 1) >>> mydesign1<-model.matrix(~Genotype * Load * Time, data=design) >>> D1 <- estimateGLMCommonDisp(D, mydesign1) # D is the DGEList of the >>> normalized counts >>> D1 <- estimateGLMTrendedDisp(D1, mydesign1) >>> fit1 <- glmFit(D1, mydesign1) >>> >>> The estimated coefficients are: >>> > colnames(fit1) >>> [1] "(Intercept)" "GenotypeG" >>> [3] "GenotypeP" "LoadLow" >>> [5] "LoadMedium" "TimePH" >>> [7] "GenotypeG:LoadLow" "GenotypeP:LoadLow" >>> [9] "GenotypeG:LoadMedium" "GenotypeP:LoadMedium" >>> [11] "GenotypeG:TimePH" "GenotypeP:TimePH" >>> [13] "LoadLow:TimePH" "LoadMedium:TimePH" >>> [15] "GenotypeG:LoadLow:TimePH" "GenotypeP:LoadLow:TimePH" >>> [17] "GenotypeG:LoadMedium:TimePH" "GenotypeP:LoadMedium:TimePH" >>> Model 2) >>> Group <- factor(paste(design$Genotype, design$Load, design$Time, >>> sep=".")) >>> # I have 18 unique levels in Group (54 samples / 3 biological >>> replicates) >>> >>> mydesign2<-model.matrix(~0+Group) >>> colnames(mydesign2) <- levels(Group) >>> >>> D2 <- estimateGLMCommonDisp(D, mydesign2) >>> D2 <- estimateGLMTrendedDisp(D2, mydesign2) >>> >>> fit2<-glmFit(D2, mydesign2) >>> > colnames(fit2) >>> [1] "H.High.F" "H.High.G" "H.High.P" "H.Low.F" >>> "H.Low.G" >>> [6] "H.Low.P" "H.Medium.F" "H.Medium.G" "H.Medium.P" >>> "PH.High.F" >>> [11] "PH.High.G" "PH.High.P" "PH.Low.F" "PH.Low.G" >>> "PH.Low.P" >>> [16] "PH.Medium.F" "PH.Medium.G" "PH.Medium.P" >>> Now, to understand the differences between the two models, my >>> question was this: >>> if I define the contrast: >>> >>> contrast1<-c(rep(0,14), 1, 0, -1, 0) # "GenotypeG:LoadLow:TimePH" vs >>> "GenotypeG:LoadMedium:TimePH" >>> >>> on the first model and I make a LR test, will I obtain the same result >>> as making a LR test on the contrast: >>> >>> contrast2<-makeContrastsG.LowvsMedium.PH = PH.Low.G - PH.Medium.G, >>> levels=mydesign2) >>> >>> on the second model? >>> This is the code I used to make the tests: >>> >>> lrt1.G.LowvsMedium.PH<-glmLRT(fit1, contrast=contrast1) >>> summary(decideTestsDGElrt1.G.LowvsMedium.PH)) >>> and >>> >>> lrt2.G.LowvsMedium.PH<-glmLRT(fit2, contrast=contrast2[," >>> G.LowvsMedium.PH"]) >>> summary(decideTestsDGElrt2.G.LowvsMedium.PH)) >>> >>> The answer to the question is no, because I've tried to do it on my data >>> and the results are different between the two tests. But why? Am I doing >>> two completely different tests? The two parameters of the two models aren't >>> saying me the same thing? >>> >>> In a moment of madness :) I tried to estimate another model, with only >>> the three-level interactions between the factors. >>> >>> Model 3) >>> mydesign3<-model.matrix(~0+Genotype:Load:Time, data=design) >>> >>> D3 <- estimateGLMCommonDisp(D, mydesign3) >>> D3 <- estimateGLMTrendedDisp(D3, mydesign3) >>> >>> fit3<-glmFit(D3, mydesign3) >>> > colnames(fit3) >>> [1] "GenotypeF:LoadHigh:TimeH" "GenotypeG:LoadHigh:TimeH" >>> [3] "GenotypeP:LoadHigh:TimeH" "GenotypeF:LoadLow:TimeH" >>> [5] "GenotypeG:LoadLow:TimeH" "GenotypeP:LoadLow:TimeH" >>> [7] "GenotypeF:LoadMedium:TimeH" "GenotypeG:LoadMedium:TimeH" >>> [9] "GenotypeP:LoadMedium:TimeH" "GenotypeF:LoadHigh:TimePH" >>> [11] "GenotypeG:LoadHigh:TimePH" "GenotypeP:LoadHigh:TimePH" >>> [13] "GenotypeF:LoadLow:TimePH" "GenotypeG:LoadLow:TimePH" >>> [15] "GenotypeP:LoadLow:TimePH" "GenotypeF:LoadMedium:TimePH" >>> [17] "GenotypeG:LoadMedium:TimePH" "GenotypeP:LoadMedium:TimePH" >>> Well, this model gave me the same parameter estimates as the model 2 >>> (each treatment combination as a group) and, obviously, the same results in >>> terms of DEG. But is it statistically correct to estimate a model with >>> three-level interactions without any principal effect? >>> >>> Thanks in advance for any replies! >>> >>> Marco >>> >>> -- output of sessionInfo(): >>> >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 >>> [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C >>> [5] LC_TIME=Italian_Italy.1252 >>> >>> attached base packages: >>> [1] splines parallel stats graphics grDevices utils datasets >>> [8] methods base >>> >>> other attached packages: >>> [1] EDASeq_1.8.0 ShortRead_1.20.0 Rsamtools_1.14.2 >>> [4] lattice_0.20-24 Biostrings_2.30.1 GenomicRanges_1.14.4 >>> [7] XVector_0.2.0 IRanges_1.20.6 Biobase_2.22.0 >>> [10] BiocGenerics_0.8.0 edgeR_3.4.2 limma_3.18.9 >>> [13] aroma.light_1.32.0 matrixStats_0.8.14 yeastRNASeq_0.0.10 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6 >>> [4] DBI_0.2-7 DESeq_1.14.0 genefilter_1.44.0 >>> [7] geneplotter_1.40.0 grid_3.0.2 hwriter_1.3 >>> [10] latticeExtra_0.6-26 R.methodsS3_1.6.1 R.oo_1.17.0 >>> [13] RColorBrewer_1.0-5 RSQLite_0.11.4 stats4_3.0.2 >>> [16] survival_2.37-6 tools_3.0.2 XML_3.98-1.1 >>> [19] xtable_1.7-1 zlibbioc_1.8.0 >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane. >>> science.biology.informatics.conductor >>> >> >> > [[alternative HTML version deleted]]

edgeR edgeR • 1.2k views

ADD COMMENT • link updated 10.2 years ago by Ryan C. Thompson ★ 7.9k • written 10.2 years ago by Marco ▴ 20

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

On 2/6/14, 2:42 AM, Marco Pegoraro wrote: > (F.High.H - F.High.PH <http: f.high.ph="">) - (F.Medium.H - > F.Medium.PH <http: f.medium.ph="">) Yes, that is the correct way to form an interaction contrast in a this group-means parametrization. > > And what if I am interested in obtaining a list of DEG for a > comparison of two levels of a factor indipendently from the levels of > the other two factors? Can I define a contrast like this? > > FvsG = F.High.H + F.High.PH <http: f.high.ph=""> + F.Medium.H + > F.Medium.PH <http: f.medium.ph=""> + F.Low.H + F.Low.PH <http: f.low.ph=""> > - G.High.H - G.High.PH <http: g.high.ph=""> - G.Medium.H - > G.Medium.PH <http: g.medium.ph=""> - G.Low.H - G.Low.PH <http: g.low.ph=""> > > and then do a LRT to find the differentially expressed genes? > Or does it make more sense to obtain a list of DEG for every contrast > I made before and then obtaining the FvsG list of DEG doing the union > of all the lists of differentially expressed genes like this: > # Let DE.FvsG.High.H, DE.FvsG.High.PH <http: de.fvsg.high.ph="">, > DE.FvsG.Medium.H, DE.FvsG.Medium.PH <http: de.fvsg.medium.ph="">, > # DE.FvsG.Low.H, DE.FvsG.Low.PH <http: de.fvsg.low.ph=""> be the lists > of differentially expressed genes > # for the related contrasts of the Contrast Matrix. > DE.FvsG<-unique(c(DE.FvsG.High.H, DE.FvsG.High.PH > <http: de.fvsg.high.ph="">, DE.FvsG.Medium.H, > DE.FvsG.Medium.PH <http: de.fvsg.medium.ph="">, DE.FvsG.Low.H, > DE.FvsG.Low.PH <http: de.fvsg.low.ph="">)) Have a look at Gordon's reply from not to long ago to a question that I asked. It covers the cases you suggest and many more. For your case, start with the section that begins "The four groups might arise from two original factors." (And feel free to read my original question for context.) https://stat.ethz.ch/pipermail/bioconductor/2014-January/057313.html -Ryan > > > 2014-01-30 Marco Pegoraro <marco.pegoraro88@gmail.com> <mailto:marco.pegoraro88@gmail.com>>: > > Dear Ryan, > > thank you very much for your reply. I agree with you about the > easier interpretation of the parametrization of Model 2. > About the differences between Models 1 and 2, you are right, the > dispersion parameters estimates are nearly identical but the > p-values are very different between the tests. Probably, as you > told me, the reason is I make some mistakes in the definition of > the contrast. > Thank you again. Have a good day, > Marco > > > 2014-01-29 Ryan <rct@thompsonclan.org <mailto:rct@thompsonclan.org="">> > > Hi Marco, > > Note that your models 2 & 3 are exactly the same model. The > expression in your formula for model 3: > > Genotype:Load:Time > > is more or less equivalent to: > > factor(paste(design$Genotype, design$Load, design$Time, sep=":")) > > except that the factor levels might possibly be in a different > order. As for your question about doing the same test with > Models 1 & 2, I believe that both models are equivalent > parametrizations of the same design, so you should get > (nearly) identical dispersion estimates, and performing > equivalent tests with the two models should give you the same > p-values. If you are not getting the same p-values, then you > have not properly set up the contrast for Model 1. This isn't > surprising, because a 3-way interaction model is really > confusing to work with, and I can't even tell you what the > correct contrast would be. I would recommend that you stick > with Model 2 (or 3), where each coefficient has a direct > biological interpretation as the estimated expression level of > a group. Designing contrasts for this parametrization is much > easier, in my opinion. I think you would agree, since you got > this contrast correct in your example code. > > -Ryan > > > On 1/29/14, 7:11 AM, Marco [guest] wrote: > > Hi all! > > I have a question about the way to define the model matrix > for a multi-factorial design in which I want to model all > the interactions. In particular, I have a 3x3x2 factorial > design (3 factors) and three biological replicates, so I > have 3x3x2x3 = 54 sample. > The three factors are: > - GENOTYPE with the three levels F, G and P > - LOAD with levels HIGH, MEDIUM and LOW > - TIME with levels H and PH > > So this is the design matrix: > > design > > Time Load Genotype > FHH_1 H High F > FHPH_1 PH High F > FHH_2 H High F > FHPH_2 PH High F > FHH_3 H High F > FHPH_3 PH High F > FMH_1 H Medium F > FMPH_1 PH Medium F > FMH_2 H Medium F > FMPH_2 PH Medium F > FMH_3 H Medium F > FMPH_3 PH Medium F > FLH_1 H Low F > FLPH_1 PH Low F > FLH_2 H Low F > FLPH_2 PH Low F > FLH_3 H Low F > FLPH_3 PH Low F > GHH_1 H High G > GHPH_1 PH High G > GHH_2 H High G > GHPH_2 PH High G > GHH_3 H High G > GHPH_3 PH High G > GMH_1 H Medium G > GMPH_1 PH Medium G > GMH_2 H Medium G > GMPH_2 PH Medium G > GMH_3 H Medium G > GMPH_3 PH Medium G > GLH_1 H Low G > GLPH_1 PH Low G > GLH_2 H Low G > GLPH_2 PH Low G > GLH_3 H Low G > GLPH_3 PH Low G > PHH_1 H High P > PHPH_1 PH High P > PHH_2 H High P > PHPH_2 PH High P > PHH_3 H High P > PHPH_3 PH High P > PMH_1 H Medium P > PMPH_1 PH Medium P > PMH_2 H Medium P > PMPH_2 PH Medium P > PMH_3 H Medium P > PMPH_3 PH Medium P > PLH_1 H Low P > PLPH_1 PH Low P > PLH_2 H Low P > PLPH_2 PH Low P > PLH_3 H Low P > PLPH_3 PH Low P > > > I am interested in finding all possible interactions > between the three factors, so I want to model also the > three way interactions GENOTYPE:LOAD:TIME. I have already > read the edgeR vignette so I was wondering if the best way > to define the model matrix is by defining it with a > formula or by defining each treatment combination as a group. > I'm trying to understand which are the differences between > the two alternatives, so I tried to estimate both the models: > > Model 1) > mydesign1<-model.matrix(~Genotype * Load * Time, > data=design) > D1 <- estimateGLMCommonDisp(D, mydesign1) # D is the > DGEList of the normalized counts > D1 <- estimateGLMTrendedDisp(D1, mydesign1) > fit1 <- glmFit(D1, mydesign1) > > The estimated coefficients are: > > colnames(fit1) > [1] "(Intercept)" "GenotypeG" > [3] "GenotypeP" "LoadLow" > [5] "LoadMedium" "TimePH" > [7] "GenotypeG:LoadLow" "GenotypeP:LoadLow" > [9] "GenotypeG:LoadMedium" "GenotypeP:LoadMedium" > [11] "GenotypeG:TimePH" "GenotypeP:TimePH" > [13] "LoadLow:TimePH" "LoadMedium:TimePH" > [15] "GenotypeG:LoadLow:TimePH" > "GenotypeP:LoadLow:TimePH" > [17] "GenotypeG:LoadMedium:TimePH" > "GenotypeP:LoadMedium:TimePH" > Model 2) > Group <- factor(paste(design$Genotype, design$Load, > design$Time, sep=".")) > # I have 18 unique levels in Group (54 samples / 3 > biological replicates) > > mydesign2<-model.matrix(~0+Group) > colnames(mydesign2) <- levels(Group) > > D2 <- estimateGLMCommonDisp(D, mydesign2) > D2 <- estimateGLMTrendedDisp(D2, mydesign2) > > fit2<-glmFit(D2, mydesign2) > > colnames(fit2) > [1] "H.High.F" "H.High.G" "H.High.P" > "H.Low.F" "H.Low.G" > [6] "H.Low.P" "H.Medium.F" "H.Medium.G" > "H.Medium.P" "PH.High.F" > [11] "PH.High.G" "PH.High.P" "PH.Low.F" > "PH.Low.G" "PH.Low.P" > [16] "PH.Medium.F" "PH.Medium.G" "PH.Medium.P" > Now, to understand the differences between the two > models, my question was this: > if I define the contrast: > > contrast1<-c(rep(0,14), 1, 0, -1, 0) # > "GenotypeG:LoadLow:TimePH" vs "GenotypeG:LoadMedium:TimePH" > > on the first model and I make a LR test, will I obtain the > same result as making a LR test on the contrast: > > contrast2<-makeContrastsG.LowvsMedium.PH > <http: g.lowvsmedium.ph=""> = PH.Low.G - PH.Medium.G, > levels=mydesign2) > > on the second model? > This is the code I used to make the tests: > > lrt1.G.LowvsMedium.PH > <http: lrt1.g.lowvsmedium.ph=""><-glmLRT(fit1, > contrast=contrast1) > summary(decideTestsDGElrt1.G.LowvsMedium.PH > <http: lrt1.g.lowvsmedium.ph="">)) > and > > lrt2.G.LowvsMedium.PH > <http: lrt2.g.lowvsmedium.ph=""><-glmLRT(fit2, > contrast=contrast2[," G.LowvsMedium.PH > <http: g.lowvsmedium.ph="">"]) > summary(decideTestsDGElrt2.G.LowvsMedium.PH > <http: lrt2.g.lowvsmedium.ph="">)) > > The answer to the question is no, because I've tried to do > it on my data and the results are different between the > two tests. But why? Am I doing two completely different > tests? The two parameters of the two models aren't saying > me the same thing? > > In a moment of madness :) I tried to estimate another > model, with only the three-level interactions between the > factors. > > Model 3) > mydesign3<-model.matrix(~0+Genotype:Load:Time, data=design) > > D3 <- estimateGLMCommonDisp(D, mydesign3) > D3 <- estimateGLMTrendedDisp(D3, mydesign3) > > fit3<-glmFit(D3, mydesign3) > > colnames(fit3) > [1] "GenotypeF:LoadHigh:TimeH" "GenotypeG:LoadHigh:TimeH" > [3] "GenotypeP:LoadHigh:TimeH" "GenotypeF:LoadLow:TimeH" > [5] "GenotypeG:LoadLow:TimeH" "GenotypeP:LoadLow:TimeH" > [7] "GenotypeF:LoadMedium:TimeH" > "GenotypeG:LoadMedium:TimeH" > [9] "GenotypeP:LoadMedium:TimeH" > "GenotypeF:LoadHigh:TimePH" > [11] "GenotypeG:LoadHigh:TimePH" > "GenotypeP:LoadHigh:TimePH" > [13] "GenotypeF:LoadLow:TimePH" "GenotypeG:LoadLow:TimePH" > [15] "GenotypeP:LoadLow:TimePH" > "GenotypeF:LoadMedium:TimePH" > [17] "GenotypeG:LoadMedium:TimePH" > "GenotypeP:LoadMedium:TimePH" > Well, this model gave me the same parameter estimates > as the model 2 (each treatment combination as a group) > and, obviously, the same results in terms of DEG. But is > it statistically correct to estimate a model with > three-level interactions without any principal effect? > > Thanks in advance for any replies! > > Marco > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 > [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C > [5] LC_TIME=Italian_Italy.1252 > > attached base packages: > [1] splines parallel stats graphics grDevices > utils datasets > [8] methods base > > other attached packages: > [1] EDASeq_1.8.0 ShortRead_1.20.0 > Rsamtools_1.14.2 > [4] lattice_0.20-24 Biostrings_2.30.1 > GenomicRanges_1.14.4 > [7] XVector_0.2.0 IRanges_1.20.6 Biobase_2.22.0 > [10] BiocGenerics_0.8.0 edgeR_3.4.2 limma_3.18.9 > [13] aroma.light_1.32.0 matrixStats_0.8.14 > yeastRNASeq_0.0.10 > > loaded via a namespace (and not attached): > [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6 > [4] DBI_0.2-7 DESeq_1.14.0 genefilter_1.44.0 > [7] geneplotter_1.40.0 grid_3.0.2 hwriter_1.3 > [10] latticeExtra_0.6-26 R.methodsS3_1.6.1 R.oo_1.17.0 > [13] RColorBrewer_1.0-5 RSQLite_0.11.4 stats4_3.0.2 > [16] survival_2.37-6 tools_3.0.2 XML_3.98-1.1 > [19] xtable_1.7-1 zlibbioc_1.8.0 > > -- > Sent via the guest posting facility at bioconductor.org > <http: bioconductor.org="">. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]]

ADD COMMENT • link 10.2 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.