Search
Question: edgeR cpm() function with and without log2
0
gravatar for b.nota
6 weeks ago by
b.nota290
Netherlands
b.nota290 wrote:

Hello,

I have a question about the cpm function from edgeR. When I use this function with log = T, I get different results from when I use it without followed by log2 transformation afterwards. What did I miss here?

Edit: Has this to do with the scaling of the prior count? If yes, what is the benefit behind this? Why is that better than just adding 0.5 read count?

> CPM <- cpm(DGE1, log = T, prior.count = 0.5, normalized.lib.sizes = F)
> tail(CPM)
                        DC07      DC08      DC09      DC10      DC11      DC12
ENSMUSG00000099399 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507
ENSMUSG00000095134 -5.935507 -5.935507 -5.935507 -3.647512 -5.935507 -5.935507
ENSMUSG00000095366 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507
ENSMUSG00000096768 -4.385629 -4.434476 -5.935507 -4.378766 -5.935507 -5.935507
ENSMUSG00000099871 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507
ENSMUSG00000096850 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507

> CPM_F <- cpm(DGE1, log = F, normalized.lib.sizes = F)
> tail(CPM_F)
                       DC07       DC08 DC09       DC10 DC11 DC12
ENSMUSG00000099399 0.000000 0.00000000    0 0.00000000    0    0
ENSMUSG00000095134 0.000000 0.00000000    0 0.06345822    0    0
ENSMUSG00000095366 0.000000 0.00000000    0 0.00000000    0    0
ENSMUSG00000096768 0.031501 0.02990833    0 0.03172911    0    0
ENSMUSG00000099871 0.000000 0.00000000    0 0.00000000    0    0
ENSMUSG00000096850 0.000000 0.00000000    0 0.00000000    0    0

> log2CPM <- log2(CPM_F + 0.5)
> tail(log2CPM)
                         DC07       DC08 DC09       DC10 DC11 DC12
ENSMUSG00000099399 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1
ENSMUSG00000095134 -1.0000000 -1.0000000   -1 -0.8276194   -1   -1
ENSMUSG00000095366 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1
ENSMUSG00000096768 -0.9118557 -0.9161853   -1 -0.9112366   -1   -1
ENSMUSG00000099871 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1
ENSMUSG00000096850 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gplots_3.0.1 edgeR_3.20.9 limma_3.34.9

loaded via a namespace (and not attached):
 [1] compiler_3.4.3     Rcpp_0.12.15       KernSmooth_2.23-15 splines_3.4.3     
 [5] gdata_2.18.0       grid_3.4.3         locfit_1.5-9.1     caTools_1.17.1    
 [9] bitops_1.0-6       gtools_3.5.0       lattice_0.20-35  
ADD COMMENTlink modified 6 weeks ago by Aaron Lun19k • written 6 weeks ago by b.nota290
2
gravatar for Aaron Lun
6 weeks ago by
Aaron Lun19k
Cambridge, United Kingdom
Aaron Lun19k wrote:

As you may have already noticed, it is because cpm adds a prior count to the counts for each library when log=TRUE. This avoids undefined values from counts of zero, and it also stabilizes the differences in log-expression values between libraries, i.e., it squeezes the log-fold changes towards zero, especially for low counts where there would be little evidence for large fold changes anyway.

Scaling ensures that the relative effect of the added prior count is the same in each library, regardless of sequencing depth. Simply adding 0.5 to each count would effectively result in a larger value being added to counts in small libraries, once you divide by the library size to compute the CPM. This would result in spurious non-zero log-fold changes; see Differences between limma voom E values and edgeR cpm values? for details.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Aaron Lun19k

Thanks for the link (with Gordon's answer) and explanation. So important is the fact that e.g., 1 fold change stays 1 fold change with scaling.

ADD REPLYlink written 6 weeks ago by b.nota290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 165 users visited in the last hour