Entering edit mode
Dear Ben,
glmLRT() allows you to specify the contrast to be tested in two ways.
If
the contrast happens to correspond to a column of your design matrix,
then
it is one the original coefficients and you can just specify which one
using the coef= argument. Otherwise you can use the contrast=
argument to
specify any contrast, using a numerical vector of the same length as
the
number of coefficients.
Suppose you used the default treatment parametrization:
contrasts(Strain) <- contr.treat(3)
contrasts(Treatment) <- contr.treat(2)
and fitted a model with Treatment nested within Strain:
design <- model.matrix(~Strain+Strain:Treatment)
Then the coefficients 4:6 would be the logFC after stimulation in
strains
A to C respectively. Hence you could test for differential expression
after stimulation in strain B by:
glmLRT(dge,fit,coef=5)
and so on. You could equally well specify the same test by
glmLRT(dge,fit,contrast=c(0,0,0,0,1,0,0))
To test for genes responding differently to simulation in strain B
compared to strain A, you would use
contrast=c(0,0,0,-1,1,0)
And so on.
Figuring out the appropriate contrast vectors using the sum-to-zero
parametrization specified by contr.sum() is more tedious, and I will
leave
that to you.
Best wishes
Gordon
> Date: Tue, 20 Dec 2011 16:44:42 -0500
> From: Benjamin King <bking at="" mdibl.org="">
> To: bioconductor at r-project.org
> Subject: [BioC] Contrasts for 3x2 factorial experiment in R/edgeR
>
> Hello,
>
> I'm using R/edgeR to analyze a 3x2 factorial RNA-Seq experiment and
I would very much appreciate your help in specifying some contrasts.
>
> The design of my experiment has two factors, strain and treatment.
There are three strains (A, B and C) and two treatments (Unstimulated
and Stimulated). I have four biological replicates (except for one
sample group where there are only three).
>
> # Group Strain Treatment
> # A.Un A Un
> # B.Un B Un
> # C.Un C Un
> # A.Stim A Stim
> # B.Stim B Stim
> # C.Stim C Stim
>
> Using "model.matrix(~Strain*Treatment)" I believe the model
coefficients using sum to zero parameterization are as follows.
>
> Intercept: (A.Un + B.Un + C.Un + A.Stim + B.Stim + C.Stim)/6
> Strain1: (-A.Un + B.Un - A.Stim + B.Stim)/4
> Strain2: (-A.Un + C.Un - A.Stim + C.Stim)/4
> Treatment1: (-A.Un - B.Un - C.Un + A.Stim + B.Stim + C.Stim)/6
> Strain1:Treatment1: (A.Un - B.Un - A.Stim + B.Stim)/4
> Strain2:Treatment2: (A.Un - C.Un - A.Stim + C.Stim)/4
>
> I'm interested in these questions:
>
> 1. Which genes are differentially expressed in Strain B compared to
Strain A?
> 2. Which genes are differentially expressed in Strain C compared to
Strain A?
> 3. Which genes respond to Stimulated treatment overall?
> 4. Which genes respond to Stimulated treatment in Strain B?
> 5. Which genes respond to Stimulated treatment in Strain C?
> 6. Which genes respond to Stimulated treatment differently in Strain
B compared to Strain A?
> 7. Which genes respond to Stimulated treatment differently in Strain
B compared to Strain A?
>
> I believe questions 1, 2, 3, 6 and 7 are model coefficients. If so,
how can these contrasts be calculated using the "coef" parameter in
the glmLRT function?
>
> For questions 4 and 5, what is the correct syntax to define the
contrast matrix using sum to zero parameterization that I would then
pass to the glmLRT function?
>
> Below is my current script and session information.
>
> Thank you in advance for your help,
>
> - Ben
>
>
>
> library(edgeR)
>
> # 3x2 Factorial Design
> # Strain Treatment
> # A Un
> # B Un
> # C Un
> # A Stim
> # B Stim
> # C Stim
>
> ## Specify factors
> Strain <- factor(c("A","A","A","B","B","B","B","C","C","C","C","A","
A","A","A","B","B","B","B","C","C","C","C"))
> Treatment <- factor(c("Un","Un","Un","Un","Un","Un","Un","Un","Un","
Un","Un","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim
","Stim","Stim","Stim"))
>
> ## Read in count data
> raw.data <- read.table("group_counts.txt",sep="\t",header=T)
> d <- raw.data[, 2:24]
> rownames(d) <- raw.data[, 1]
>
> # make DGE object
> dge <- DGEList(counts = d, group = c("A.Un","A.Un","A.Un","B.Un","B.
Un","B.Un","B.Un","C.Un","C.Un","C.Un","C.Un","A.Stim","A.Stim","A.Sti
m","A.Stim","B.Stim","B.Stim","B.Stim","B.Stim","C.Stim","C.Stim","C.S
tim","C.Stim"))
> dge <- calcNormFactors(dge)
>
> # filter uninformative genes
> m <- 1e6 * t(t(dge$counts) / dge$samples$lib.size)
> ridx <- rowSums(m > 1) >= 2
> dge <- dge[ridx,]
>
> ## Design matrix
> # treatment-contrast parameterization
> contrasts(Strain) <- contr.sum(3)
> contrasts(Treatment) <- contr.sum(2)
> design <- model.matrix(~Strain*Treatment)
>
> ## Estimate Dispersion
> dge <- estimateGLMCommonDisp(dge, design)
>
> ## Fit model with Common Dispersion
> fit <- glmFit(dge, design, dispersion=dge$common.dispersion)
>
>
>
>
>> sessionInfo()
> R version 2.13.1 (2011-07-08)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] splines stats graphics grDevices utils datasets
methods base
>
> other attached packages:
> [1] edgeR_2.2.5
>
> loaded via a namespace (and not attached):
> [1] limma_3.8.3 tools_2.13.1
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}