Question

Contrasts for 3x2 factorial experiment in R/edgeR

0

Entering edit mode

Benjamin King ▴ 30

@benjamin-king-4567

Last seen 9.6 years ago

Hello, I'm using R/edgeR to analyze a 3x2 factorial RNA-Seq experiment and I would very much appreciate your help in specifying some contrasts. The design of my experiment has two factors, strain and treatment. There are three strains (A, B and C) and two treatments (Unstimulated and Stimulated). I have four biological replicates (except for one sample group where there are only three). # Group Strain Treatment # A.Un A Un # B.Un B Un # C.Un C Un # A.Stim A Stim # B.Stim B Stim # C.Stim C Stim Using "model.matrix(~Strain*Treatment)" I believe the model coefficients using sum to zero parameterization are as follows. Intercept: (A.Un + B.Un + C.Un + A.Stim + B.Stim + C.Stim)/6 Strain1: (-A.Un + B.Un - A.Stim + B.Stim)/4 Strain2: (-A.Un + C.Un - A.Stim + C.Stim)/4 Treatment1: (-A.Un - B.Un - C.Un + A.Stim + B.Stim + C.Stim)/6 Strain1:Treatment1: (A.Un - B.Un - A.Stim + B.Stim)/4 Strain2:Treatment2: (A.Un - C.Un - A.Stim + C.Stim)/4 I'm interested in these questions: 1. Which genes are differentially expressed in Strain B compared to Strain A? 2. Which genes are differentially expressed in Strain C compared to Strain A? 3. Which genes respond to Stimulated treatment overall? 4. Which genes respond to Stimulated treatment in Strain B? 5. Which genes respond to Stimulated treatment in Strain C? 6. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A? 7. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A? I believe questions 1, 2, 3, 6 and 7 are model coefficients. If so, how can these contrasts be calculated using the "coef" parameter in the glmLRT function? For questions 4 and 5, what is the correct syntax to define the contrast matrix using sum to zero parameterization that I would then pass to the glmLRT function? Below is my current script and session information. Thank you in advance for your help, - Ben library(edgeR) # 3x2 Factorial Design # Strain Treatment # A Un # B Un # C Un # A Stim # B Stim # C Stim ## Specify factors Strain <- factor(c("A","A","A","B","B","B","B","C","C","C","C","A","A" ,"A","A","B","B","B","B","C","C","C","C")) Treatment <- factor(c("Un","Un","Un","Un","Un","Un","Un","Un","Un","Un ","Un","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim", "Stim","Stim","Stim")) ## Read in count data raw.data <- read.table("group_counts.txt",sep="\t",header=T) d <- raw.data[, 2:24] rownames(d) <- raw.data[, 1] # make DGE object dge <- DGEList(counts = d, group = c("A.Un","A.Un","A.Un","B.Un","B.Un ","B.Un","B.Un","C.Un","C.Un","C.Un","C.Un","A.Stim","A.Stim","A.Stim" ,"A.Stim","B.Stim","B.Stim","B.Stim","B.Stim","C.Stim","C.Stim","C.Sti m","C.Stim")) dge <- calcNormFactors(dge) # filter uninformative genes m <- 1e6 * t(t(dge$counts) / dge$samples$lib.size) ridx <- rowSums(m > 1) >= 2 dge <- dge[ridx,] ## Design matrix # treatment-contrast parameterization contrasts(Strain) <- contr.sum(3) contrasts(Treatment) <- contr.sum(2) design <- model.matrix(~Strain*Treatment) ## Estimate Dispersion dge <- estimateGLMCommonDisp(dge, design) ## Fit model with Common Dispersion fit <- glmFit(dge, design, dispersion=dge$common.dispersion) > sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.2.5 loaded via a namespace (and not attached): [1] limma_3.8.3 tools_2.13.1

• 2.3k views

ADD COMMENT • link 12.4 years ago Benjamin King ▴ 30