edgeR, very big lib.size makes CPM very small
1
0
Entering edit mode
Vang Le ▴ 80
@vang-le-6690
Last seen 4.7 years ago
Denmark
Hello, I am working with count table that has very big lib.size: > dge at .Data[[2]]$lib.size [1] 3.2e+08 4.2e+08 4.5e+08 3.8e+08 2.3e+08 2.1e+08 3.3e+08 2.8e+08 This causes CPM very small, and consequently very negative logCPM. This is 'head' of my cpm(counts): C1 C2 C3 C4 T1 T2 T3 T4 00000001 0.000 0.0000 0.0000 0.0026 0.0042 0.000 0.000 0.0035 00000002 0.012 0.0092 0.0086 0.0103 0.0042 0.014 0.006 0.0070 00000003 0.073 0.0554 0.0474 0.0620 0.0584 0.056 0.057 0.0525 00000004 0.073 0.0624 0.0496 0.0620 0.0626 0.056 0.060 0.0525 00000005 0.076 0.0624 0.0496 0.0594 0.0584 0.056 0.060 0.0490 00000006 0.067 0.0624 0.0474 0.0620 0.0584 0.046 0.066 0.0630 The point that concerns me here is the effect number of decimal places and rounding of numbers may lose sensitivity. Is this something that can effect the outcome of analysis? If it does, should I just scale the counts up before putting the data through my workflow? ##### body of 'cpm' function/method ####### { x <- as.matrix(x) if (is.null(lib.size)) lib.size <- colSums(x) if (log) { prior.count.scaled <- lib.size/mean(lib.size) * prior.count lib.size <- lib.size + 2 * prior.count.scaled } lib.size <- 1e-06 * lib.size if (log) log2(t((t(x) + prior.count.scaled)/lib.size)) else t(t(x)/lib.size) } Kind regards, Vang Quy Le Bioinformatician, Molecular Biologist, PhD +45 97 66 56 29 vql at rn.dk AALBORG UNIVERSITY HOSPITAL Section for Molecular Diagnostics, Clinical Biochemistry Reberbansgade DK 9000 Aalborg www.aalborguh.rn.dk
• 1.4k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 18 hours ago
WEHI, Melbourne, Australia
> Date: Wed, 13 Aug 2014 13:37:23 +0000 > From: Vang Quy Le / Region Nordjylland <vql at="" rn.dk=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR, very big lib.size makes CPM very small > > Hello, > I am working with count table that has very big lib.size: >> dge at .Data[[2]]$lib.size > [1] 3.2e+08 4.2e+08 4.5e+08 3.8e+08 2.3e+08 2.1e+08 3.3e+08 2.8e+08 > > > This causes CPM very small, and consequently very negative logCPM. This is 'head' of my cpm(counts): > > C1 C2 C3 C4 T1 T2 T3 T4 > 00000001 0.000 0.0000 0.0000 0.0026 0.0042 0.000 0.000 0.0035 > 00000002 0.012 0.0092 0.0086 0.0103 0.0042 0.014 0.006 0.0070 > 00000003 0.073 0.0554 0.0474 0.0620 0.0584 0.056 0.057 0.0525 > 00000004 0.073 0.0624 0.0496 0.0620 0.0626 0.056 0.060 0.0525 > 00000005 0.076 0.0624 0.0496 0.0594 0.0584 0.056 0.060 0.0490 > 00000006 0.067 0.0624 0.0474 0.0620 0.0584 0.046 0.066 0.0630 > > > The point that concerns me here is the effect number of decimal places > and rounding of numbers may lose sensitivity. No, not unless you are planning to run R on a 1960's calculator without floating point arithmetic. > Is this something that can effect the outcome of analysis? No. Modern computers with floating point arithmetic have no trouble with trivial issues like this. Floating point arithmetic means that numbers are not rounded to any fixed number of decimal places. Rather, all numbers are stored to the same number of significant figures regardless of their absolute size. > If it does, should I just scale the counts up before putting the data > through my workflow? No, you should not falsify the true nature of your data to edgeR. Gordon > ##### body of 'cpm' function/method ####### > { > x <- as.matrix(x) > if (is.null(lib.size)) > lib.size <- colSums(x) > if (log) { > prior.count.scaled <- lib.size/mean(lib.size) * prior.count > lib.size <- lib.size + 2 * prior.count.scaled > } > lib.size <- 1e-06 * lib.size > if (log) > log2(t((t(x) + prior.count.scaled)/lib.size)) > else t(t(x)/lib.size) > } > > > Kind regards, > > Vang Quy Le > Bioinformatician, Molecular Biologist, PhD > > +45 97 66 56 29 > vql at rn.dk > > AALBORG UNIVERSITY HOSPITAL > Section for Molecular Diagnostics, > Clinical Biochemistry > Reberbansgade > DK 9000 Aalborg > www.aalborguh.rn.dk ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
Thank you for the confirmation, Gordon. Even though it might be a small thing, but very assuring for me to know. Best regards Vang On 15 Aug 2014, at 03:04, Gordon K Smyth <smyth at="" wehi.edu.au<mailto:smyth="" at="" wehi.edu.au="">> wrote: Date: Wed, 13 Aug 2014 13:37:23 +0000 From: Vang Quy Le / Region Nordjylland <vql@rn.dk<mailto:vql@rn.dk>> To: "bioconductor at r-project.org<mailto:bioconductor at="" r-project.org="">" <bioconductor at="" r-project.org<mailto:bioconductor="" at="" r-project.org="">> Subject: [BioC] edgeR, very big lib.size makes CPM very small Hello, I am working with count table that has very big lib.size: dge at .Data[[2]]$lib.size [1] 3.2e+08 4.2e+08 4.5e+08 3.8e+08 2.3e+08 2.1e+08 3.3e+08 2.8e+08 This causes CPM very small, and consequently very negative logCPM. This is 'head' of my cpm(counts): C1 C2 C3 C4 T1 T2 T3 T4 00000001 0.000 0.0000 0.0000 0.0026 0.0042 0.000 0.000 0.0035 00000002 0.012 0.0092 0.0086 0.0103 0.0042 0.014 0.006 0.0070 00000003 0.073 0.0554 0.0474 0.0620 0.0584 0.056 0.057 0.0525 00000004 0.073 0.0624 0.0496 0.0620 0.0626 0.056 0.060 0.0525 00000005 0.076 0.0624 0.0496 0.0594 0.0584 0.056 0.060 0.0490 00000006 0.067 0.0624 0.0474 0.0620 0.0584 0.046 0.066 0.0630 The point that concerns me here is the effect number of decimal places and rounding of numbers may lose sensitivity. No, not unless you are planning to run R on a 1960's calculator without floating point arithmetic. Is this something that can effect the outcome of analysis? No. Modern computers with floating point arithmetic have no trouble with trivial issues like this. Floating point arithmetic means that numbers are not rounded to any fixed number of decimal places. Rather, all numbers are stored to the same number of significant figures regardless of their absolute size. If it does, should I just scale the counts up before putting the data through my workflow? No, you should not falsify the true nature of your data to edgeR. Gordon ##### body of 'cpm' function/method ####### { x <- as.matrix(x) if (is.null(lib.size)) lib.size <- colSums(x) if (log) { prior.count.scaled <- lib.size/mean(lib.size) * prior.count lib.size <- lib.size + 2 * prior.count.scaled } lib.size <- 1e-06 * lib.size if (log) log2(t((t(x) + prior.count.scaled)/lib.size)) else t(t(x)/lib.size) } Kind regards, Vang Quy Le Bioinformatician, Molecular Biologist, PhD +45 97 66 56 29 vql at rn.dk<mailto:vql at="" rn.dk=""> AALBORG UNIVERSITY HOSPITAL Section for Molecular Diagnostics, Clinical Biochemistry Reberbansgade DK 9000 Aalborg www.aalborguh.rn.dk ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:22}}
ADD REPLY

Login before adding your answer.

Traffic: 539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6