The difference between three methods in calcNormFacotors() in edgeR
1
0
Entering edit mode
Zhan Tianyu ▴ 40
@zhan-tianyu-6632
Last seen 10.2 years ago

Hello all,

I have a question concerning the calcNormFacotrs() in edgeR. There are three methods that I could choose from: "TMM", "RLE", and "upperquartile". I am wondering how could decide which one to use?

For example, consider a simple example like this: there are 10 genes in total, and 4 genes in two groups. Therefore, the counts data would be a 10*8 matrix, where each row is the gene, each column is the individual, and the 1-4 columns are the first group, 5-8 columns are the second group. Among the 10 genes, 60% genes are the differential genes: the counts of No. 3,4,5,6,8,9 in the first group are doubled, while others are the sample. Please see the attachments for this count data.

Then I generated the "group" factor via this command:
> grp <- as.factor(rep(0:1, each = 8/2))

After that, I generated the DGEList by:
> d <- DGEList(counts = counts, group = grp )

Then I calculated the normalization factor by edgeR:
> n <- calcNormFactors(d)

By default, this function uses the "TMM" method. However, the normalization factors look like this:

group               lib.size             norm.factors
Sample1     0  5062446        1.1195829383593
Sample2     0  5062340        0.8154739771400
S
ample3     0  5062444        1.1195827474525
Sample4     0  5062466        1.1403164060313
Sample5     1  3000123        0.9624162935534
Sample6     1  2999992        0.9624163157255
Sample7     1  2999977        0.9624169648716
Sample8     1  3000156        0.9624160077253

I think it is weird, because normalization factors for individuals 1 and 2 are quite different (1.11958, and 0.81547). However, from the counts data, their counts are generally the same (Please see the attachment for counts data).

Then I tried the method of RLE method:
n <- calcNormFactors(d,method="RLE")

The results are:

$samples
            group  lib.size            norm.factors
Sample1     0  5062446         1.0886765699045
Sample2     0  5062340         1.0886508565338
Sample3     0  5062444         1.0886766741626
Sample4     0  5062466         1.0886750099086
Sample5     1  3000123         0.9185446848068
Sample6     1  2999992         0.9185578680804
Sample7     1  2999977         0.9185624609049
Sample8     1  3000156         0.9185437155777

I think this time the results are more reasonable. My question is how I decide which method to use? Why TMM gives a weird result?

Thank you.

Best regards,
sewen67

Normalization edgeR • 3.3k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

Dear Zhan Tianyu,

The edgeR authors obviously recommend TMM. It is the default and is used in all the edgeR examples and case studies.

I don't know of any published comparative study showing better performance for the other methods.

TMM is not however designed to work well with very small numbers of genes (such as your toy example with 10 genes). Actually, your toy example does not fit the assumptions of any the normalization methods because the majority of the genes (all but four in fact) are differentially expressed. I don't think you can learn much about the performance of the different methods on real data from this example.

If you think that TMM has given an incorrect result for a real dataset then I suggest that you send your data example offline to the TMM author, Mark Robinson, so that he can trouble-shoot.

There was no attachment with your email, and I don't think that you have examined the right thing to judge which is the better normalization.

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode

Following up on Gordon's answer, the TMM method will support cases where up to 60% of genes are DE. However, this upper limit is only supported when the DE genes are evenly distributed between groups, i.e., 30% of genes are upregulated, and 30% are downregulated. In your case, the 60% of DE genes are all upregulated in one group, so it's not surprising that TMM fails.

ADD REPLY

Login before adding your answer.

Traffic: 1065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6