Bioconductor: CPM and log(CPM) values in EdgeR does not match

0

Entering edit mode

@csrk-christian-schrder-kaas-6069

Last seen 10.3 years ago

I have normalized my RNAseq read counts in EdgeR. If I try to look at the normalized data: total=cpm(w, normalized.lib.size=TRUE) And specifically the CPM values for my cell line 1 (in triplicates) for the gene Hspa5 total["Hspa5",1:3] 7492.944 6750.397 5727.190 If I find the fold change between cell line 1 and my control cell line 2 I get: total["Hspa5",1:3]/total["Hspa5",27:29] 3.409239 3.399253 2.910913 So by this very basic test I find that a fold change of about 3.3 fold is found between my two cell lines. The triplicates is seen to agree fine Then I use EdgeR to find the same et <- exactTest(w, pair=c(13,1)) et["Hspa5",] logFC logCPM PValue FDR Hspa5 1.80454 12.11638 2.519341e-65 4.262793e-63 FC = exp(1.80454) = 6.077175 CPM = exp(12.11638) = 182842 So after EdgeR has made the comparison between the two cell lines, the CPM value is suddenly 27x higher and (more importantly for me) the fold change is now 2x higher (6x instead of 3x). Is there someone out there that can explain this difference for me? I have read the edgeRUsersGuide and the CPM relevant parts of the Reference Manual but I still have not stumbled across the answer. I know that the log(CPM) is taking into account the estimated dispersions and the library sizes so it a bit different from the CPM directly ... but 27x difference? and why the 2x difference in the fold change? Which of the 2 results would you state in a table showing your RNAseq results in a publication? __________________________ Christian Schrøder Kaas PhD student at The Technical University of Denmark and Novo Nordisk A/S [[alternative HTML version deleted]]

RNASeq edgeR RNASeq edgeR • 4.8k views

ADD COMMENT • link updated 11.4 years ago by Michael Stadler ▴ 350 • written 11.4 years ago by CSRK (Christian Schrøder Kaas) ▴ 10

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 6.1 years ago

Dear Christian, I think you want base 2: > 2^1.80454 [1] 3.493178 > 2^(12.11638) [1] 4440.111 Best, Mark ---------- Prof. Dr. Mark Robinson Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://tiny.cc/mrobin On 01.08.2013, at 09:08, CSRK (Christian Schr?der Kaas) <csrk at="" novonordisk.com=""> wrote: > I have normalized my RNAseq read counts in EdgeR. If I try to look at the normalized data: > > total=cpm(w, normalized.lib.size=TRUE) > And specifically the CPM values for my cell line 1 (in triplicates) for the gene Hspa5 > > total["Hspa5",1:3] > 7492.944 6750.397 5727.190 > > If I find the fold change between cell line 1 and my control cell line 2 I get: > > total["Hspa5",1:3]/total["Hspa5",27:29] > 3.409239 3.399253 2.910913 > > So by this very basic test I find that a fold change of about 3.3 fold is found between my two cell lines. The triplicates is seen to agree fine > > Then I use EdgeR to find the same > et <- exactTest(w, pair=c(13,1)) > et["Hspa5",] > logFC logCPM PValue FDR > Hspa5 1.80454 12.11638 2.519341e-65 4.262793e-63 > > FC = exp(1.80454) = 6.077175 > CPM = exp(12.11638) = 182842 > > So after EdgeR has made the comparison between the two cell lines, the CPM value is suddenly 27x higher and (more importantly for me) the fold change is now 2x higher (6x instead of 3x). Is there someone out there that can explain this difference for me? I have read the edgeRUsersGuide and the CPM relevant parts of the Reference Manual but I still have not stumbled across the answer. > > I know that the log(CPM) is taking into account the estimated dispersions and the library sizes so it a bit different from the CPM directly ... but 27x difference? and why the 2x difference in the fold change? > Which of the 2 results would you state in a table showing your RNAseq results in a publication? > > __________________________ > > Christian Schr?der Kaas > PhD student at The Technical University of Denmark and Novo Nordisk A/S > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.4 years ago Mark Robinson ▴ 880

0

Entering edit mode

Michael Stadler ▴ 350

@michael-stadler-5887

Last seen 4 weeks ago

Switzerland

Hi Christian, I am just a user of edgeR, but could it be that you are taking the wrong base, i.e. e = exp(1) =~ 2.718, instead of 2.0? I think that the base for both logFC and logCPM is 2.0, which would give you roughly the 3.3-fold change that you are expecting: > 2^1.8 [1] 3.482202 Best, Michael On 01.08.2013 09:08, CSRK (Christian Schr?der Kaas) wrote: > I have normalized my RNAseq read counts in EdgeR. If I try to look at the normalized data: > > total=cpm(w, normalized.lib.size=TRUE) > And specifically the CPM values for my cell line 1 (in triplicates) for the gene Hspa5 > > total["Hspa5",1:3] > 7492.944 6750.397 5727.190 > > If I find the fold change between cell line 1 and my control cell line 2 I get: > > total["Hspa5",1:3]/total["Hspa5",27:29] > 3.409239 3.399253 2.910913 > > So by this very basic test I find that a fold change of about 3.3 fold is found between my two cell lines. The triplicates is seen to agree fine > > Then I use EdgeR to find the same > et <- exactTest(w, pair=c(13,1)) > et["Hspa5",] > logFC logCPM PValue FDR > Hspa5 1.80454 12.11638 2.519341e-65 4.262793e-63 > > FC = exp(1.80454) = 6.077175 > CPM = exp(12.11638) = 182842 > > So after EdgeR has made the comparison between the two cell lines, the CPM value is suddenly 27x higher and (more importantly for me) the fold change is now 2x higher (6x instead of 3x). Is there someone out there that can explain this difference for me? I have read the edgeRUsersGuide and the CPM relevant parts of the Reference Manual but I still have not stumbled across the answer. > > I know that the log(CPM) is taking into account the estimated dispersions and the library sizes so it a bit different from the CPM directly ... but 27x difference? and why the 2x difference in the fold change? > Which of the 2 results would you state in a table showing your RNAseq results in a publication? > > __________________________ > > Christian Schr?der Kaas > PhD student at The Technical University of Denmark and Novo Nordisk A/S > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- -------------------------------------------- Michael Stadler, PhD Head of Computational Biology Friedrich Miescher Institute Basel (Switzerland) Phone : +41 61 697 6492 Fax : +41 61 697 3976 Mail : michael.stadler at fmi.ch

ADD COMMENT • link 11.4 years ago Michael Stadler ▴ 350

Login before adding your answer.