Question: Defining the model matrix for a multi-factorial design with interactions - edgeR
4.2 years ago by
Guest User • 12k
Guest User • 12k wrote:
Hi all! I have a question about the way to define the model matrix for a multi-factorial design in which I want to model all the interactions. In particular, I have a 3x3x2 factorial design (3 factors) and three biological replicates, so I have 3x3x2x3 = 54 sample. The three factors are: - GENOTYPE with the three levels F, G and P - LOAD with levels HIGH, MEDIUM and LOW - TIME with levels H and PH So this is the design matrix: > design Time Load Genotype FHH_1 H High F FHPH_1 PH High F FHH_2 H High F FHPH_2 PH High F FHH_3 H High F FHPH_3 PH High F FMH_1 H Medium F FMPH_1 PH Medium F FMH_2 H Medium F FMPH_2 PH Medium F FMH_3 H Medium F FMPH_3 PH Medium F FLH_1 H Low F FLPH_1 PH Low F FLH_2 H Low F FLPH_2 PH Low F FLH_3 H Low F FLPH_3 PH Low F GHH_1 H High G GHPH_1 PH High G GHH_2 H High G GHPH_2 PH High G GHH_3 H High G GHPH_3 PH High G GMH_1 H Medium G GMPH_1 PH Medium G GMH_2 H Medium G GMPH_2 PH Medium G GMH_3 H Medium G GMPH_3 PH Medium G GLH_1 H Low G GLPH_1 PH Low G GLH_2 H Low G GLPH_2 PH Low G GLH_3 H Low G GLPH_3 PH Low G PHH_1 H High P PHPH_1 PH High P PHH_2 H High P PHPH_2 PH High P PHH_3 H High P PHPH_3 PH High P PMH_1 H Medium P PMPH_1 PH Medium P PMH_2 H Medium P PMPH_2 PH Medium P PMH_3 H Medium P PMPH_3 PH Medium P PLH_1 H Low P PLPH_1 PH Low P PLH_2 H Low P PLPH_2 PH Low P PLH_3 H Low P PLPH_3 PH Low P I am interested in finding all possible interactions between the three factors, so I want to model also the three way interactions GENOTYPE:LOAD:TIME. I have already read the edgeR vignette so I was wondering if the best way to define the model matrix is by defining it with a formula or by defining each treatment combination as a group. I'm trying to understand which are the differences between the two alternatives, so I tried to estimate both the models: Model 1) mydesign1<-model.matrix(~Genotype * Load * Time, data=design) D1 <- estimateGLMCommonDisp(D, mydesign1) # D is the DGEList of the normalized counts D1 <- estimateGLMTrendedDisp(D1, mydesign1) fit1 <- glmFit(D1, mydesign1) The estimated coefficients are: > colnames(fit1)  "(Intercept)" "GenotypeG"  "GenotypeP" "LoadLow"  "LoadMedium" "TimePH"  "GenotypeG:LoadLow" "GenotypeP:LoadLow"  "GenotypeG:LoadMedium" "GenotypeP:LoadMedium"  "GenotypeG:TimePH" "GenotypeP:TimePH"  "LoadLow:TimePH" "LoadMedium:TimePH"  "GenotypeG:LoadLow:TimePH" "GenotypeP:LoadLow:TimePH"  "GenotypeG:LoadMedium:TimePH" "GenotypeP:LoadMedium:TimePH" Model 2) Group <- factor(paste(design$Genotype, design$Load, design$Time, sep=".")) # I have 18 unique levels in Group (54 samples / 3 biological replicates) mydesign2<-model.matrix(~0+Group) colnames(mydesign2) <- levels(Group) D2 <- estimateGLMCommonDisp(D, mydesign2) D2 <- estimateGLMTrendedDisp(D2, mydesign2) fit2<-glmFit(D2, mydesign2) > colnames(fit2)  "H.High.F" "H.High.G" "H.High.P" "H.Low.F" "H.Low.G"  "H.Low.P" "H.Medium.F" "H.Medium.G" "H.Medium.P" "PH.High.F"  "PH.High.G" "PH.High.P" "PH.Low.F" "PH.Low.G" "PH.Low.P"  "PH.Medium.F" "PH.Medium.G" "PH.Medium.P" Now, to understand the differences between the two models, my question was this: if I define the contrast: contrast1<-c(rep(0,14), 1, 0, -1, 0) # "GenotypeG:LoadLow:TimePH" vs "GenotypeG:LoadMedium:TimePH" on the first model and I make a LR test, will I obtain the same result as making a LR test on the contrast: contrast2<-makeContrastsG.LowvsMedium.PH = PH.Low.G - PH.Medium.G, levels=mydesign2) on the second model? This is the code I used to make the tests: lrt1.G.LowvsMedium.PH<-glmLRT(fit1, contrast=contrast1) summarydecideTestsDGElrt1.G.LowvsMedium.PH)) and lrt2.G.LowvsMedium.PH<-glmLRT(fit2, contrast=contrast2[," G.LowvsMedium.PH"]) summarydecideTestsDGElrt2.G.LowvsMedium.PH)) The answer to the question is no, because I've tried to do it on my data and the results are different between the two tests. But why? Am I doing two completely different tests? The two parameters of the two models aren't saying me the same thing? In a moment of madness :) I tried to estimate another model, with only the three-level interactions between the factors. Model 3) mydesign3<-model.matrix(~0+Genotype:Load:Time, data=design) D3 <- estimateGLMCommonDisp(D, mydesign3) D3 <- estimateGLMTrendedDisp(D3, mydesign3) fit3<-glmFit(D3, mydesign3) > colnames(fit3)  "GenotypeF:LoadHigh:TimeH" "GenotypeG:LoadHigh:TimeH"  "GenotypeP:LoadHigh:TimeH" "GenotypeF:LoadLow:TimeH"  "GenotypeG:LoadLow:TimeH" "GenotypeP:LoadLow:TimeH"  "GenotypeF:LoadMedium:TimeH" "GenotypeG:LoadMedium:TimeH"  "GenotypeP:LoadMedium:TimeH" "GenotypeF:LoadHigh:TimePH"  "GenotypeG:LoadHigh:TimePH" "GenotypeP:LoadHigh:TimePH"  "GenotypeF:LoadLow:TimePH" "GenotypeG:LoadLow:TimePH"  "GenotypeP:LoadLow:TimePH" "GenotypeF:LoadMedium:TimePH"  "GenotypeG:LoadMedium:TimePH" "GenotypeP:LoadMedium:TimePH" Well, this model gave me the same parameter estimates as the model 2 (each treatment combination as a group) and, obviously, the same results in terms of DEG. But is it statistically correct to estimate a model with three-level interactions without any principal effect? Thanks in advance for any replies! Marco -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale:  LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252  LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C  LC_TIME=Italian_Italy.1252 attached base packages:  splines parallel stats graphics grDevices utils datasets  methods base other attached packages:  EDASeq_1.8.0 ShortRead_1.20.0 Rsamtools_1.14.2  lattice_0.20-24 Biostrings_2.30.1 GenomicRanges_1.14.4  XVector_0.2.0 IRanges_1.20.6 Biobase_2.22.0  BiocGenerics_0.8.0 edgeR_3.4.2 limma_3.18.9  aroma.light_1.32.0 matrixStats_0.8.14 yeastRNASeq_0.0.10 loaded via a namespace (and not attached):  annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6  DBI_0.2-7 DESeq_1.14.0 genefilter_1.44.0  geneplotter_1.40.0 grid_3.0.2 hwriter_1.3  latticeExtra_0.6-26 R.methodsS3_1.6.1 R.oo_1.17.0  RColorBrewer_1.0-5 RSQLite_0.11.4 stats4_3.0.2  survival_2.37-6 tools_3.0.2 XML_3.98-1.1  xtable_1.7-1 zlibbioc_1.8.0 -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENT • link •