I have 3 groups: untreated, negative control (mock treatment), and treated; with 3 replicates in each. I am looking for differential expression between the groups, most importantly the negative control -- treated.
Which is a better approach:
- calculating the normalization factors using all 9 libraries, or
- calculating the normalization factors using only the 2×3 libraries that are compared at a time (and load in three count tables, entirely separately)?
Example for first option:
groups <- factor(c("A", "A", "B", "C", "C", "A", "B", "B", "C")) dgedata <- DGEList(counts=rnadata, group=groups) keep <- rowSums(cpm(dgedata) > 1) >= 3 dgedata <- dgedata[keep, keep.lib.sizes=FALSE] dgedata <- calcNormFactors(dgedata, method=c("TMM")) dgedata <- estimateCommonDisp(dgedata) dgedata <- estimateTagwiseDisp(dgedata) dgedata.results <- exactTest(dgedata, pair=c("A", "B"))
(This is mostly theoretical, as the two approaches differ in only about 20 DE genes (out of hundreds), in each comparison, but I am wondering about the justifications.)
OK, thanks for the clarification. The normalization factors are indeed very similar in the two cases.