Hello world-
I have an RNAseq experiment comprised of two factors, time (4x time points) and genotype (7x genotypes), that I am analyzing with edgeR. Each time point/genotype was sequenced with 3x biological replicates (84 samples total).
# meta data for design meta <- data.frame( row.names = colnames(count), time = rep(c("0","2","4","6"), each=3), genotype = rep(c("A","B","C","D","E","F","G"), each=12))
The feature counts were filtered to remove rows (genes) with zero counts, and then I applied the independent filtering as described in the vignette.
e <- DGEList(counts=count) ## check for independent filtering (minimum ## counts in 3 replicates) keep <- rowSums(cpm(e)>1) >= 3 table(keep) e <- e[keep,] dim(e) ## reset library size after filtering e$samples$lib.size <- colSums(e$counts) e$samples
Following section 3.3.1 of the users guide for edgeR, I combined all the experimental factors into one combined factor.
# GLM approach to multiple groups Group <- factor(paste(meta$genotype, meta$time, sep = ".")) cbind(meta, Group=Group) edesign <- model.matrix(~0+Group) colnames(edesign) <- levels(Group) e <- DGEList(counts=count) e <- calcNormFactors(e) e <- estimateGLMCommonDisp(e, edesign, verbose = TRUE) e <- estimateGLMTrendedDisp(e, edesign, verbose = TRUE) e <- estimateGLMTagwiseDisp(e, edesign) ## fit the model fit <- glmFit(e, edesign) colnames(fit) #[1] "A.0" "A.2" "A.4" "A.6" "B.0" "B.2" "B.4" "B.6" #[9] "C.0" "C.2" "C.4" "C.6" "D.0" "D.2" "D.4" "D.6" #[17] "E.0" "E.2" "E.4" "E.6" "F.0" "F.2" "F.4" "F.6" #[25] "G.0" "G.2" "G.4" "G.6"
I created the following contrasts.
my.contrasts <- makeContrasts( time.2v0 = (A.2-A.0)-(B.2-B.0)-(C.2-C.0)-(D.2-D.0)-(E.2-E.0)-(F.2-F.0)-(G.2-G.0), time.4v2 = (A.4-A.2)-(B.4-B.2)-(C.4-C.2)-(D.4-D.2)-(E.4-E.2)-(F.4-F.2)-(G.4-G.2), time.6v4 = (A.6-A.4)-(B.6-B.4)-(C.6-C.4)-(D.6-D.4)-(E.6-E.4)-(F.6-F.4)-(G.6-G.4), levels = edesign ) my.contrasts # Contrasts # Levels time.2v0 time.4v2 time.6v4 # A.0 -1 0 0 # A.2 1 -1 0 # A.4 0 1 -1 # A.6 0 0 1 # B.0 1 0 0 # B.2 -1 1 0 # B.4 0 -1 1 # B.6 0 0 -1 # C.0 1 0 0 # C.2 -1 1 0 # C.4 0 -1 1 # C.6 0 0 -1 # D.0 1 0 0 # D.2 -1 1 0 # D.4 0 -1 1 # D.6 0 0 -1 # E.0 1 0 0 # E.2 -1 1 0 # E.4 0 -1 1 # E.6 0 0 -1 # F.0 1 0 0 # F.2 -1 1 0 # F.4 0 -1 1 # F.6 0 0 -1 # G.0 1 0 0 # G.2 -1 1 0 # G.4 0 -1 1 # G.6 0 0 -1
To find genes that have responded differently in all genotypes at time 2 (2vs.0).
# time 2vs.0 contrast results lrt.t2v0 <- glmLRT(fit, edesign, contrast=my.contrasts[,"time.2v0"], test = "F") etab.t2v0 <- topTags(lrt.t2v0, n=nrow(e))$table etab.t2v0 <- etab.t2v0[order(etab.t2v0$FDR), ] head(etab.t2v0) summary(etab.t2v0$FDR < 0.05,)
The contrasts were based upon pg 29 of the vignette (see: "DrugvsPlacebo.2h = (DDrug.2h-Drug.0h) - (Placebo.2h-Placebo.0h). Here, I am treating the 7-levels of the genotype similarly to the 2-levels of treatment (placebo & drug) to find common differences in all genotypes between the time points. Its not clear to me if I can even do the contrasts like this. My results seem to make sense, but I wanted to seek help from the bioconductor community.
Cheers,
Ryan Ghan
University of Nevada, Reno
Dept. of Biochemistry & Molecular Biology
sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.3 (Yosemite) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.10.2 limma_3.24.10 loaded via a namespace (and not attached): [1] magrittr_1.5 R6_2.0.1 assertthat_0.1 parallel_3.2.0 DBI_0.3.1 tools_3.2.0 [7] Rcpp_0.11.6 splines_3.2.0
A couple of notes:
test="F"
option inglmLRT
. Either usetest="chisq"
, or use theglmQLFit
andglmQLFTest
pipeline. The latter implement the F-test in a more rigorous manner thantest="F"
.Hi Aaron-
Thanks for your comments on the testing and DE. Genotype is really the incorrect term for my experiment, a word used for this toy example. My experiment is looking at responses in different grape cultivars, so, while I agree with your comments regarding stringency, it was there by design to see if anything in common stuck out in particular. However, I am also investigating any time specific changes, regardless of "genotype". Thanks again. -Ryan