Hello,
I have a question about the cpm function from edgeR. When I use this function with log = T, I get different results from when I use it without followed by log2 transformation afterwards. What did I miss here?
Edit: Has this to do with the scaling of the prior count? If yes, what is the benefit behind this? Why is that better than just adding 0.5 read count?
> CPM <- cpm(DGE1, log = T, prior.count = 0.5, normalized.lib.sizes = F)
> tail(CPM)
                        DC07      DC08      DC09      DC10      DC11      DC12
ENSMUSG00000099399 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507
ENSMUSG00000095134 -5.935507 -5.935507 -5.935507 -3.647512 -5.935507 -5.935507
ENSMUSG00000095366 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507
ENSMUSG00000096768 -4.385629 -4.434476 -5.935507 -4.378766 -5.935507 -5.935507
ENSMUSG00000099871 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507
ENSMUSG00000096850 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507 -5.935507
> CPM_F <- cpm(DGE1, log = F, normalized.lib.sizes = F)
> tail(CPM_F)
                       DC07       DC08 DC09       DC10 DC11 DC12
ENSMUSG00000099399 0.000000 0.00000000    0 0.00000000    0    0
ENSMUSG00000095134 0.000000 0.00000000    0 0.06345822    0    0
ENSMUSG00000095366 0.000000 0.00000000    0 0.00000000    0    0
ENSMUSG00000096768 0.031501 0.02990833    0 0.03172911    0    0
ENSMUSG00000099871 0.000000 0.00000000    0 0.00000000    0    0
ENSMUSG00000096850 0.000000 0.00000000    0 0.00000000    0    0
> log2CPM <- log2(CPM_F + 0.5)
> tail(log2CPM)
                         DC07       DC08 DC09       DC10 DC11 DC12
ENSMUSG00000099399 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1
ENSMUSG00000095134 -1.0000000 -1.0000000   -1 -0.8276194   -1   -1
ENSMUSG00000095366 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1
ENSMUSG00000096768 -0.9118557 -0.9161853   -1 -0.9112366   -1   -1
ENSMUSG00000099871 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1
ENSMUSG00000096850 -1.0000000 -1.0000000   -1 -1.0000000   -1   -1
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] gplots_3.0.1 edgeR_3.20.9 limma_3.34.9
loaded via a namespace (and not attached):
 [1] compiler_3.4.3     Rcpp_0.12.15       KernSmooth_2.23-15 splines_3.4.3     
 [5] gdata_2.18.0       grid_3.4.3         locfit_1.5-9.1     caTools_1.17.1    
 [9] bitops_1.0-6       gtools_3.5.0       lattice_0.20-35  
                    
                
                
Thanks for the link (with Gordon's answer) and explanation. So important is the fact that e.g., 1 fold change stays 1 fold change with scaling.