Question

edgeR, very big lib.size makes CPM very small

0

Entering edit mode

Vang Le ▴ 80

@vang-le-6690

Last seen 5.6 years ago

Denmark

Hello, I am working with count table that has very big lib.size: > dge at .Data[[2]]$lib.size [1] 3.2e+08 4.2e+08 4.5e+08 3.8e+08 2.3e+08 2.1e+08 3.3e+08 2.8e+08 This causes CPM very small, and consequently very negative logCPM. This is 'head' of my cpm(counts): C1 C2 C3 C4 T1 T2 T3 T4 00000001 0.000 0.0000 0.0000 0.0026 0.0042 0.000 0.000 0.0035 00000002 0.012 0.0092 0.0086 0.0103 0.0042 0.014 0.006 0.0070 00000003 0.073 0.0554 0.0474 0.0620 0.0584 0.056 0.057 0.0525 00000004 0.073 0.0624 0.0496 0.0620 0.0626 0.056 0.060 0.0525 00000005 0.076 0.0624 0.0496 0.0594 0.0584 0.056 0.060 0.0490 00000006 0.067 0.0624 0.0474 0.0620 0.0584 0.046 0.066 0.0630 The point that concerns me here is the effect number of decimal places and rounding of numbers may lose sensitivity. Is this something that can effect the outcome of analysis? If it does, should I just scale the counts up before putting the data through my workflow? ##### body of 'cpm' function/method ####### { x <- as.matrix(x) if (is.null(lib.size)) lib.size <- colSums(x) if (log) { prior.count.scaled <- lib.size/mean(lib.size) * prior.count lib.size <- lib.size + 2 * prior.count.scaled } lib.size <- 1e-06 * lib.size if (log) log2(t((t(x) + prior.count.scaled)/lib.size)) else t(t(x)/lib.size) } Kind regards, Vang Quy Le Bioinformatician, Molecular Biologist, PhD +45 97 66 56 29 vql at rn.dk AALBORG UNIVERSITY HOSPITAL Section for Molecular Diagnostics, Clinical Biochemistry Reberbansgade DK 9000 Aalborg www.aalborguh.rn.dk

• 1.5k views

ADD COMMENT • link updated 11.3 years ago by Gordon Smyth 53k • written 11.3 years ago by Vang Le ▴ 80

score 0 · Answer 1 · 2014-08-15

> Date: Wed, 13 Aug 2014 13:37:23 +0000 > From: Vang Quy Le / Region Nordjylland <vql at="" rn.dk=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR, very big lib.size makes CPM very small > > Hello, > I am working with count table that has very big lib.size: >> dge at .Data[[2]]$lib.size > [1] 3.2e+08 4.2e+08 4.5e+08 3.8e+08 2.3e+08 2.1e+08 3.3e+08 2.8e+08 > > > This causes CPM very small, and consequently very negative logCPM. This is 'head' of my cpm(counts): > > C1 C2 C3 C4 T1 T2 T3 T4 > 00000001 0.000 0.0000 0.0000 0.0026 0.0042 0.000 0.000 0.0035 > 00000002 0.012 0.0092 0.0086 0.0103 0.0042 0.014 0.006 0.0070 > 00000003 0.073 0.0554 0.0474 0.0620 0.0584 0.056 0.057 0.0525 > 00000004 0.073 0.0624 0.0496 0.0620 0.0626 0.056 0.060 0.0525 > 00000005 0.076 0.0624 0.0496 0.0594 0.0584 0.056 0.060 0.0490 > 00000006 0.067 0.0624 0.0474 0.0620 0.0584 0.046 0.066 0.0630 > > > The point that concerns me here is the effect number of decimal places > and rounding of numbers may lose sensitivity. No, not unless you are planning to run R on a 1960's calculator without floating point arithmetic. > Is this something that can effect the outcome of analysis? No. Modern computers with floating point arithmetic have no trouble with trivial issues like this. Floating point arithmetic means that numbers are not rounded to any fixed number of decimal places. Rather, all numbers are stored to the same number of significant figures regardless of their absolute size. > If it does, should I just scale the counts up before putting the data > through my workflow? No, you should not falsify the true nature of your data to edgeR. Gordon > ##### body of 'cpm' function/method ####### > { > x <- as.matrix(x) > if (is.null(lib.size)) > lib.size <- colSums(x) > if (log) { > prior.count.scaled <- lib.size/mean(lib.size) * prior.count > lib.size <- lib.size + 2 * prior.count.scaled > } > lib.size <- 1e-06 * lib.size > if (log) > log2(t((t(x) + prior.count.scaled)/lib.size)) > else t(t(x)/lib.size) > } > > > Kind regards, > > Vang Quy Le > Bioinformatician, Molecular Biologist, PhD > > +45 97 66 56 29 > vql at rn.dk > > AALBORG UNIVERSITY HOSPITAL > Section for Molecular Diagnostics, > Clinical Biochemistry > Reberbansgade > DK 9000 Aalborg > www.aalborguh.rn.dk ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}