edgeR (3.10.2) contrast question for experimental factors combined as a group
1
1
Entering edit mode
rghan • 0
@rghan-8226
Last seen 8.8 years ago
United States

Hello world-

I have an RNAseq experiment comprised of two factors, time (4x time points) and genotype (7x genotypes), that I am analyzing with edgeR. Each time point/genotype was sequenced with 3x biological replicates (84 samples total).

# meta data for design
meta <- data.frame(
  row.names = colnames(count),
  time = rep(c("0","2","4","6"), each=3),
  genotype = rep(c("A","B","C","D","E","F","G"), each=12))

The feature counts were filtered to remove rows (genes) with zero counts, and then I applied the independent filtering as described in the vignette.

e <- DGEList(counts=count)
## check for independent filtering (minimum ## counts in 3 replicates)
keep <- rowSums(cpm(e)>1) >= 3
table(keep)
e <- e[keep,]
dim(e)
## reset library size after filtering
e$samples$lib.size <- colSums(e$counts)
e$samples

Following section 3.3.1 of the users guide for edgeR, I combined all the experimental factors into one combined factor.

# GLM approach to multiple groups
Group <- factor(paste(meta$genotype, meta$time, sep = "."))
cbind(meta, Group=Group)

edesign <- model.matrix(~0+Group)
colnames(edesign) <- levels(Group)
e <- DGEList(counts=count)
e <- calcNormFactors(e)
e <- estimateGLMCommonDisp(e, edesign, verbose = TRUE)
e <- estimateGLMTrendedDisp(e, edesign, verbose = TRUE)
e <- estimateGLMTagwiseDisp(e, edesign)

## fit the model
fit <- glmFit(e, edesign)
colnames(fit)
#[1] "A.0" "A.2" "A.4" "A.6" "B.0" "B.2" "B.4" "B.6"
#[9] "C.0" "C.2" "C.4" "C.6" "D.0" "D.2" "D.4" "D.6"
#[17] "E.0" "E.2" "E.4" "E.6" "F.0" "F.2" "F.4" "F.6"
#[25] "G.0" "G.2" "G.4" "G.6"

I created the following contrasts.

my.contrasts <- makeContrasts(
  time.2v0  = (A.2-A.0)-(B.2-B.0)-(C.2-C.0)-(D.2-D.0)-(E.2-E.0)-(F.2-F.0)-(G.2-G.0),
  time.4v2  = (A.4-A.2)-(B.4-B.2)-(C.4-C.2)-(D.4-D.2)-(E.4-E.2)-(F.4-F.2)-(G.4-G.2),
  time.6v4  = (A.6-A.4)-(B.6-B.4)-(C.6-C.4)-(D.6-D.4)-(E.6-E.4)-(F.6-F.4)-(G.6-G.4),
  levels = edesign
  )
my.contrasts
# Contrasts
# Levels time.2v0 time.4v2 time.6v4
# A.0       -1        0        0
# A.2        1       -1        0
# A.4        0        1       -1
# A.6        0        0        1
# B.0        1        0        0
# B.2       -1        1        0
# B.4        0       -1        1
# B.6        0        0       -1
# C.0        1        0        0
# C.2       -1        1        0
# C.4        0       -1        1
# C.6        0        0       -1
# D.0        1        0        0
# D.2       -1        1        0
# D.4        0       -1        1
# D.6        0        0       -1
# E.0        1        0        0
# E.2       -1        1        0
# E.4        0       -1        1
# E.6        0        0       -1
# F.0        1        0        0
# F.2       -1        1        0
# F.4        0       -1        1
# F.6        0        0       -1
# G.0        1        0        0
# G.2       -1        1        0
# G.4        0       -1        1
# G.6        0        0       -1

To find genes that have responded differently in all genotypes at time 2 (2vs.0).

# time 2vs.0 contrast results
lrt.t2v0 <- glmLRT(fit, edesign, contrast=my.contrasts[,"time.2v0"], test = "F")
etab.t2v0 <- topTags(lrt.t2v0, n=nrow(e))$table
etab.t2v0 <- etab.t2v0[order(etab.t2v0$FDR), ]
head(etab.t2v0)
summary(etab.t2v0$FDR < 0.05,)

The contrasts were based upon pg 29 of the vignette (see: "DrugvsPlacebo.2h = (DDrug.2h-Drug.0h) - (Placebo.2h-Placebo.0h). Here, I am treating the 7-levels of the genotype similarly to the 2-levels of treatment (placebo & drug) to find common differences in all genotypes between the time points. Its not clear to me if I can even do the contrasts like this. My results seem to make sense, but I wanted to seek help from the bioconductor community.

Cheers,

Ryan Ghan

University of Nevada, Reno

Dept. of Biochemistry & Molecular Biology

sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.10.2  limma_3.24.10

loaded via a namespace (and not attached):
[1] magrittr_1.5   R6_2.0.1       assertthat_0.1 parallel_3.2.0 DBI_0.3.1      tools_3.2.0   
[7] Rcpp_0.11.6    splines_3.2.0
edgeR rnaseq multiple factor design • 2.1k views
ADD COMMENT
3
Entering edit mode

A couple of notes:

  • We'll be phasing out the test="F" option in glmLRT. Either use test="chisq", or use the glmQLFit and glmQLFTest pipeline. The latter implement the F-test in a more rigorous manner than test="F".
  • You say you want to find genes that respond differently to time in all genotypes. This seems unnecessarily stringent - you'll need to set up contrasts for all pairs of genotypes and intersect them, to guarantee that you'll get DE genes where each genotype responds to time differently from the others. Perhaps a more informative goal would be to find genes that respond differently to time in any genotype, in which case Ryan's answer would be appropriate.
ADD REPLY
0
Entering edit mode

Hi Aaron-

Thanks for your comments on the testing and DE. Genotype is really the incorrect term for my experiment, a word used for this toy example. My experiment is looking at responses in different grape cultivars, so, while I agree with your comments regarding stringency, it was there by design to see if anything in common stuck out in particular. However, I am also investigating any time specific changes, regardless of "genotype". Thanks again. -Ryan

ADD REPLY
3
Entering edit mode
@ryan-c-thompson-5618
Last seen 4 weeks ago
Icahn School of Medicine at Mount Sinai…

You have extrapolated the example incorrectly. If you want to test for differential time response between two time points across any of the seven genotypes, you'll need to construct six separate contrasts, which will conduct an ANOVA-like test of multiple contrasts.

Your three consecutive time point tests would look something like this:

my.contrasts <- list(
    time.2v0 = c(
    AvsB="(B.2-B.0)-(A.2-A.0)",
    AvsC="(C.2-C.0)-(A.2-A.0)",
    AvsD="(D.2-D.0)-(A.2-A.0)",
    AvsE="(E.2-E.0)-(A.2-A.0)",
    AvsF="(F.2-F.0)-(A.2-A.0)",
    AvsG="(G.2-G.0)-(A.2-A.0)"),
    time.4v2 = c(
    AvsB="(B.4-B.2)-(A.4-A.2)",
    AvsC="(C.4-C.2)-(A.4-A.2)",
    AvsD="(D.4-D.2)-(A.4-A.2)",
    AvsE="(E.4-E.2)-(A.4-A.2)",
    AvsF="(F.4-F.2)-(A.4-A.2)",
    AvsG="(G.4-G.2)-(A.4-A.2)"),
    time.6v4 = c(
    AvsB="(B.6-B.4)-(A.6-A.4)",
    AvsC="(C.6-C.4)-(A.6-A.4)",
    AvsD="(D.6-D.4)-(A.6-A.4)",
    AvsE="(E.6-E.4)-(A.6-A.4)",
    AvsF="(F.6-F.4)-(A.6-A.4)",
    AvsG="(G.6-G.4)-(A.6-A.4)"))
my.contrast.matrices <- lapply(my.contrasts, function(cvec) makeContrasts(contrasts=cvec, levels=edesign))

Each of those three vectors of 6 contrasts represents a test for differential genotype response to time between any of the seven genotypes for a given pair of time points. If you want to perform this test across all time points, simply concatenate all 18 contrasts, or cbind the three contrast matrices using do.call(cbind, my.contrast.matrices). (But you're going to have a fun time figuring out what actually happened for the significant genes).

ADD COMMENT
0
Entering edit mode

Hi Ryan-

Thanks for your informative comments. I think I'll stick to the pairs of time points versus the large, concatenated contrast. You are right, I don't need that many columns of logFCs to try and interpret.

In your example contrast, am I correct in thinking the "A" group/genotype is acting like an intercept? Also, I can implement the "my.contrasts.matrices" in the glmLRT call by:

lrt.time2v0 <- glmLRT(fit, edesign, contrast=my.contrast.matrices$time.2v0)

and selecting the specific contrast vector that I want?

-Ryan

ADD REPLY
0
Entering edit mode

Conceptually, yes, the set of contrasts that I provided is using A as an intercept and comparing all the others to it. But you could actually use any six contrasts as long as they "connect" all seven genotypes. For example, you could do A vs B, B vs C, ..., F vs G, and you would get the same p-values, because either one represents the same null hypothesis: that all seven genotype means are equal to each other. edgeR will report the logFC values for the contrasts that you specified, but you can compute all the other pairwise logFC values from the six that you have. For example, if you have AvsB and AvsC and you want the logFC for BvsC, just do BvsC = AvsC - AvsB.

As for the calling sequence to use the contrast matrices, you've got it right.

ADD REPLY
0
Entering edit mode

Ok, that makes sense to me. Thanks again for you comments, Ryan.
 

ADD REPLY

Login before adding your answer.

Traffic: 605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6