Question: Should I calculate normalization factors in edgeR using all libraries or using only the compared libraries?
0
gravatar for Peter
4.6 years ago by
Peter0
Ireland
Peter0 wrote:

I have 3 groups: untreated, negative control (mock treatment), and treated; with 3 replicates in each. I am looking for differential expression between the groups, most importantly the negative control -- treated.

Which is a better approach:
- calculating the normalization factors using all 9 libraries, or
- calculating the normalization factors using only the 2×3 libraries that are compared at a time (and load in three count tables, entirely separately)?

Example for first option:

groups <- factor(c("A", "A", "B", "C", "C", "A", "B", "B", "C"))
dgedata <- DGEList(counts=rnadata, group=groups)
keep <- rowSums(cpm(dgedata) > 1) >= 3
dgedata <- dgedata[keep, keep.lib.sizes=FALSE]
dgedata <- calcNormFactors(dgedata, method=c("TMM"))
dgedata <- estimateCommonDisp(dgedata)
dgedata <- estimateTagwiseDisp(dgedata)
dgedata.results <- exactTest(dgedata, pair=c("A", "B"))


(This is mostly theoretical, as the two approaches differ in only about 20 DE genes (out of hundreds), in each comparison, but I am wondering about the justifications.)

 

edger calcnormfactors rna-seq • 562 views
ADD COMMENTlink modified 4.6 years ago by Aaron Lun24k • written 4.6 years ago by Peter0
Answer: Should I calculate normalization factors in edgeR using all libraries or using o
2
gravatar for Aaron Lun
4.6 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

You should use all of the libraries in a dataset when running edgeR, as this provides more residual d.f. for dispersion estimation. This means you should be calculating normalization factors for all 9 libraries at once, rather than separately analyzing a count table for each of the three pairwise comparisons.

In any case, the actual normalization factors should not be very different. calcNormFactors picks a reference library and calculates the near-median M-value (i.e., the systematic difference) of each other library against that reference. If you change the input libraries, the only effect on the calculation would concern the reference library that is chosen. The size of the systematic difference between two libraries should not change much, whether it is calculated directly between libraries or through the reference (i.e., calculate A against reference, then B against the reference, to get A against B).

ADD COMMENTlink written 4.6 years ago by Aaron Lun24k

OK, thanks for the clarification. The normalization factors are indeed very similar in the two cases.

ADD REPLYlink written 4.6 years ago by Peter0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 192 users visited in the last hour