edgeR on ncRNA analysis question
1
0
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia
It does look like you may have done something wrong. In fact, the output doesn't make sense to me. The CPM and average logCPM values output by edgeR should be unchanged regardless of the comparison you are testing, so the two output tables you give cannot be from the same data. And you seem to have wildtype samples only?? Normalization of ncRNA reads is very challenging, but there seems a much more basic problem here. In the absence of any code leading to the output given, it is impossible to say more. Best wishes Gordon > Date: Fri, 29 Nov 2013 12:06:40 +0100 > From: alessandro.guffanti at genomnia.com > To: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Cc: bioinfo at genomnia.com > Subject: [BioC] edgeR on ncRNA analysis question > > Der BioC edgeR developers and users: > > I am using edgeR for ncRNA transcriptome data analysis - ie mapping RNA seq > results only versus a ncRNA transcript database (bowtie from Color Space > reads) > > There seems to be, unsurprisingly, an high variability on these samples, > which affects obviously the FDR > > However, what surprised us is that the CPM for the same samples in different > comparisons (TMM-normalized) are always very different > > As an example: > * > **Comparison **A* > > Transcript_ID logFC logCPM PValue FDR WT_4_CPM WT_7.CPM WT_10.CPM > ENST00000456355 1.42 10.91 0.00001 0.03283 2843 2926 2631 > > > * > **Comparison **B > > * > Transcript_ID logFC logCPM PValue FDR WT_4_CPM WT_7.CPM WT_10.CPM > > > ENST00000456355 0.91 11.11 0.00003 0.00361 190 341 157 > > > Can TMM normalization affect so heavily the CPM values of the same > samples in different comparisons, > or do we have something else wrong here ? > > Thanks in advance for any feedback on this, > > Alessandro G > > --- > > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.4.0 limma_3.18.3 > > > -- > Alessandro Guffanti > Head, Bioinformatics > *Genomnia srl* > Via Nerviano, 31/B -- 20020 Lainate (MI) > Tel. +39-0293305.702 / Fax +39-0293305.777 > www.genomnia.com <http: www.genomnia.com=""> > alessandro.guffanti at genomnia.com <mailto:alessandro.guffanti at="" genomnia.com=""> ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
Normalization edgeR Normalization edgeR • 1.3k views
ADD COMMENT
0
Entering edit mode
@alessandroguffantigenomniacom-4436
Last seen 9.6 years ago
Hi - OK, thanks for the feedback, I will them look carefully at the procedure No, I did not have only WTA, but in the two comparisons the experiment samples were different - i.e. these are the same W.T. samples compared with two different set of experiment samples, which I did not copy in the output. Thanks again and keep in touch Alessandro ----------------------------------------------------- Alessandro Guffanti - Head, Bioinformatics Genomnia srl Via Nerviano, 31/B – 20020 Lainate (MI) Tel. +39-0293305.702 / Fax +39-0293305.777 www.genomnia.com [http://www.genomnia.com/] alessandro.guffanti@genomnia.com [mailto:alessandro.guffanti@genomnia.com] Per cortesia, prima di stampare questa e-mail pensate all'ambiente. Please consider the environment before printing this mail note. -----Original Message----- From: Gordon K Smyth <smyth@wehi.edu.au> To: alessandro.guffanti@genomnia.com Cc: Bioconductor mailing list <bioconductor@r-project.org>, Mark Robinson <mark.robinson@imls.uzh.ch> Date: Sun, 1 Dec 2013 13:25:26 +1100 (AUS Eastern Daylight Time) Subject: edgeR on ncRNA analysis question It does look like you may have done something wrong. In fact, the output doesn't make sense to me. The CPM and average logCPM values output by edgeR should be unchanged regardless of the comparison you are testing, so the two output tables you give cannot be from the same data. And you seem to have wildtype samples only?? Normalization of ncRNA reads is very challenging, but there seems a much more basic problem here. In the absence of any code leading to the output given, it is impossible to say more. Best wishes Gordon > Date: Fri, 29 Nov 2013 12:06:40 +0100 > From: alessandro.guffanti@genomnia.com [mailto:alessandro.guffanti%40genomnia.com] > To: Bioconductor mailing list <bioconductor@r-project.org [mailto:bioconductor%40r-project.org]=""> > Cc: bioinfo@genomnia.com [mailto:bioinfo%40genomnia.com] > Subject: [BioC] edgeR on ncRNA analysis question > > Der BioC edgeR developers and users: > > I am using edgeR for ncRNA transcriptome data analysis - ie mapping RNA seq > results only versus a ncRNA transcript database (bowtie from Color Space > reads) > > There seems to be, unsurprisingly, an high variability on these samples, > which affects obviously the FDR > > However, what surprised us is that the CPM for the same samples in different > comparisons (TMM-normalized) are always very different > > As an example: > * > **Comparison **A* > > Transcript_ID logFC logCPM PValue FDR WT_4_CPM WT_7.CPM WT_10.CPM > ENST00000456355 1.42 10.91 0.00001 0.03283 2843 2926 2631 > > > * > **Comparison **B > > * > Transcript_ID logFC logCPM PValue FDR WT_4_CPM WT_7.CPM WT_10.CPM > > > ENST00000456355 0.91 11.11 0.00003 0.00361 190 341 157 > > > Can TMM normalization affect so heavily the CPM values of the same > samples in different comparisons, > or do we have something else wrong here ? > > Thanks in advance for any feedback on this, > > Alessandro G > > --- ----------------------------------------------------------- Il Contenuto del presente messaggio potrebbe contenere informazioni confidenziali a favore dei soli destinatari del messaggio stesso. Qualora riceviate per errore questo messaggio siete pregati di cancellarlo dalla memoria del computer e di contattare i numeri sopra indicati. Ogni utilizzo o ritrasmissione dei contenuti del messaggio da parte di soggetti diversi dai destinatari è da considerarsi vietato ed abusivo. The information transmitted is intended only for the per...{{dropped:10}}
0
Entering edit mode
Dear Alessandro, In the usual edgeR pipeline, one does not construct different datasets to make different comparisons. Rather the idea is to analyse all the samples together, and simply to test different comparisons. Scale normalization is done only once. Best wishes Gordon On Sun, 1 Dec 2013, Genomnia - Guffanti Alessandro wrote: > Hi - OK, thanks for the feedback, I will them look carefully at the > procedure > > No, I did not have only WTA, but in the two comparisons the experiment > samples were different - i.e. these are the same W.T. samples compared with > two different set of experiment samples, which I did not copy in the output. > > Thanks again and keep in touch > > Alessandro > > > > ----------------------------------------------------- > Alessandro Guffanti - Head, Bioinformatics > Genomnia srl > Via Nerviano, 31/B ??? 20020 Lainate (MI) > Tel. +39-0293305.702 / Fax +39-0293305.777 > www.genomnia.com [http://www.genomnia.com/] > alessandro.guffanti at genomnia.com [mailto:alessandro.guffanti at genomnia.com] > > Per cortesia, prima di stampare questa e-mail pensate all'ambiente. > Please consider the environment before printing this mail note. > > -----Original Message----- > From: Gordon K Smyth <smyth at="" wehi.edu.au=""> > To: alessandro.guffanti at genomnia.com > Cc: Bioconductor mailing list <bioconductor at="" r-project.org="">, Mark Robinson > <mark.robinson at="" imls.uzh.ch=""> > Date: Sun, 1 Dec 2013 13:25:26 +1100 (AUS Eastern Daylight Time) > Subject: edgeR on ncRNA analysis question > > > It does look like you may have done something wrong. In fact, the output > doesn't make sense to me. The CPM and average logCPM values output by > edgeR should be unchanged regardless of the comparison you are testing, so > the two output tables you give cannot be from the same data. And you seem > to have wildtype samples only?? > > Normalization of ncRNA reads is very challenging, but there seems a much > more basic problem here. > > In the absence of any code leading to the output given, it is impossible > to say more. > > Best wishes > Gordon > >> Date: Fri, 29 Nov 2013 12:06:40 +0100 >> From: alessandro.guffanti at genomnia.com > [mailto:alessandro.guffanti%40genomnia.com] >> To: Bioconductor mailing list <bioconductor at="" r-project.org=""> [mailto:bioconductor%40r-project.org]> >> Cc: bioinfo at genomnia.com [mailto:bioinfo%40genomnia.com] >> Subject: [BioC] edgeR on ncRNA analysis question >> >> Der BioC edgeR developers and users: >> >> I am using edgeR for ncRNA transcriptome data analysis - ie mapping RNA > seq >> results only versus a ncRNA transcript database (bowtie from Color Space >> reads) >> >> There seems to be, unsurprisingly, an high variability on these samples, >> which affects obviously the FDR >> >> However, what surprised us is that the CPM for the same samples in > different >> comparisons (TMM-normalized) are always very different >> >> As an example: >> * >> **Comparison **A* >> >> Transcript_ID logFC logCPM PValue FDR WT_4_CPM WT_7.CPM > WT_10.CPM >> ENST00000456355 1.42 10.91 0.00001 0.03283 2843 2926 > 2631 >> >> >> * >> **Comparison **B >> >> * >> Transcript_ID logFC logCPM PValue FDR WT_4_CPM WT_7.CPM > WT_10.CPM >> >> >> ENST00000456355 0.91 11.11 0.00003 0.00361 190 341 > 157 >> >> >> Can TMM normalization affect so heavily the CPM values of the same >> samples in different comparisons, >> or do we have something else wrong here ? >> >> Thanks in advance for any feedback on this, >> >> Alessandro G >> >> --- ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD REPLY
0
Entering edit mode
I will check wether this was the case ! Thanks again Alessandro ----------------------------------------------------- Alessandro Guffanti - Head, Bioinformatics Genomnia srl Via Nerviano, 31/B – 20020 Lainate (MI) Tel. +39-0293305.702 / Fax +39-0293305.777 www.genomnia.com [http://www.genomnia.com/] alessandro.guffanti@genomnia.com [mailto:alessandro.guffanti@genomnia.com] Per cortesia, prima di stampare questa e-mail pensate all'ambiente. Please consider the environment before printing this mail note. -----Original Message----- From: Gordon K Smyth <smyth@wehi.edu.au> To: Genomnia - Guffanti Alessandro <alessandro.guffanti@genomnia.com> Cc: Bioconductor mailing list <bioconductor@r-project.org>, Mark Robinson <mark.robinson@imls.uzh.ch> Date: Sun, 1 Dec 2013 19:55:48 +1100 (AUS Eastern Daylight Time) Subject: Re: edgeR on ncRNA analysis question Dear Alessandro, In the usual edgeR pipeline, one does not construct different datasets to make different comparisons. Rather the idea is to analyse all the samples together, and simply to test different comparisons. Scale normalization is done only once. Best wishes Gordon On Sun, 1 Dec 2013, Genomnia - Guffanti Alessandro wrote: > Hi - OK, thanks for the feedback, I will them look carefully at the > procedure > > No, I did not have only WTA, but in the two comparisons the experiment > samples were different - i.e. these are the same W.T. samples compared with > two different set of experiment samples, which I did not copy in the output. > > Thanks again and keep in touch > > Alessandro > > > > ----------------------------------------------------- > Alessandro Guffanti - Head, Bioinformatics > Genomnia srl > Via Nerviano, 31/B – 20020 Lainate (MI) > Tel. +39-0293305.702 / Fax +39-0293305.777 > www.genomnia.com [http://www.genomnia.com/] [http://www.genomnia.com/ [http://www.genomnia.com/]] > alessandro.guffanti@genomnia.com [mailto:alessandro.guffanti%40genomnia.com] [mailto:alessandro.guffanti@genomnia.com [mailto:alessandro.guffanti%40genomnia.com]] > > Per cortesia, prima di stampare questa e-mail pensate all'ambiente. > Please consider the environment before printing this mail note. > > -----Original Message----- > From: Gordon K Smyth <smyth@wehi.edu.au [mailto:smyth%40wehi.edu.au]=""> > To: alessandro.guffanti@genomnia.com [mailto:alessandro.guffanti%40genomnia.com] > Cc: Bioconductor mailing list <bioconductor@r-project.org [mailto:bioconductor%40r-project.org]="">, Mark Robinson > <mark.robinson@imls.uzh.ch [mailto:mark.robinson%40imls.uzh.ch]=""> > Date: Sun, 1 Dec 2013 13:25:26 +1100 (AUS Eastern Daylight Time) > Subject: edgeR on ncRNA analysis question > > > It does look like you may have done something wrong. In fact, the output > doesn't make sense to me. The CPM and average logCPM values output by > edgeR should be unchanged regardless of the comparison you are testing, so > the two output tables you give cannot be from the same data. And you seem > to have wildtype samples only?? > > Normalization of ncRNA reads is very challenging, but there seems a much > more basic problem here. > > In the absence of any code leading to the output given, it is impossible > to say more. > > Best wishes > Gordon > >> Date: Fri, 29 Nov 2013 12:06:40 +0100 >> From: alessandro.guffanti@genomnia.com [mailto:alessandro.guffanti%40genomnia.com] > [mailto:alessandro.guffanti%40genomnia.com] >> To: Bioconductor mailing list <bioconductor@r-project.org [mailto:bioconductor%40r-project.org]=""> [mailto:bioconductor%40r-project.org]> >> Cc: bioinfo@genomnia.com [mailto:bioinfo%40genomnia.com] [mailto:bioinfo%40genomnia.com] >> Subject: [BioC] edgeR on ncRNA analysis question >> >> Der BioC edgeR developers and users: >> >> I am using edgeR for ncRNA transcriptome data analysis - ie mapping RNA > seq >> results only versus a ncRNA transcript database (bowtie from Color Space >> reads) >> >> There seems to be, unsurprisingly, an high variability on these samples, >> which affects obviously the FDR >> >> However, what surprised us is that the CPM for the same samples in > different >> comparisons (TMM-normalized) are always very different >> >> As an example: >> * >> **Comparison **A* >> >> Transcript_ID logFC logCPM PValue FDR WT_4_CPM WT_7.CPM > WT_10.CPM >> ENST00000456355 1.42 10.91 0.00001 0.03283 2843 2926 > 2631 >> >> >> * >> **Comparison **B >> >> * >> Transcript_ID logFC logCPM PValue FDR WT_4_CPM WT_7.CPM > WT_10.CPM >> >> >> ENST00000456355 0.91 11.11 0.00003 0.00361 190 341 > 157 >> >> >> Can TMM normalization affect so heavily the CPM values of the same >> samples in different comparisons, >> or do we have something else wrong here ? >> >> Thanks in advance for any feedback on this, >> >> Alessandro G >> >> --- ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:23}}

Login before adding your answer.

Traffic: 807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6