Question: Should I calculate normalization factors in edgeR using all libraries or using only the compared libraries?
0
4.4 years ago by
Peter0
Ireland
Peter0 wrote:

I have 3 groups: untreated, negative control (mock treatment), and treated; with 3 replicates in each. I am looking for differential expression between the groups, most importantly the negative control -- treated.

Which is a better approach:
- calculating the normalization factors using all 9 libraries, or
- calculating the normalization factors using only the 2×3 libraries that are compared at a time (and load in three count tables, entirely separately)?

Example for first option:

groups <- factor(c("A", "A", "B", "C", "C", "A", "B", "B", "C"))
keep <- rowSums(cpm(dgedata) > 1) >= 3
dgedata <- dgedata[keep, keep.lib.sizes=FALSE]
dgedata <- calcNormFactors(dgedata, method=c("TMM"))
dgedata <- estimateCommonDisp(dgedata)
dgedata <- estimateTagwiseDisp(dgedata)
dgedata.results <- exactTest(dgedata, pair=c("A", "B"))

(This is mostly theoretical, as the two approaches differ in only about 20 DE genes (out of hundreds), in each comparison, but I am wondering about the justifications.)

edger calcnormfactors rna-seq • 541 views
modified 4.4 years ago by Aaron Lun23k • written 4.4 years ago by Peter0
Answer: Should I calculate normalization factors in edgeR using all libraries or using o
2
4.4 years ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

You should use all of the libraries in a dataset when running edgeR, as this provides more residual d.f. for dispersion estimation. This means you should be calculating normalization factors for all 9 libraries at once, rather than separately analyzing a count table for each of the three pairwise comparisons.

In any case, the actual normalization factors should not be very different. calcNormFactors picks a reference library and calculates the near-median M-value (i.e., the systematic difference) of each other library against that reference. If you change the input libraries, the only effect on the calculation would concern the reference library that is chosen. The size of the systematic difference between two libraries should not change much, whether it is calculated directly between libraries or through the reference (i.e., calculate A against reference, then B against the reference, to get A against B).