Is there any prejudice whether to use edgeR or DESeq for differential expression analysis for RNA Seq data

0

Entering edit mode

Sakshi Gulati ▴ 40

@sakshi-gulati-5596

Last seen 9.6 years ago

Hi I am unsure as to if there is any particular condition that is the deciding factor between whether to use edgeR or DESeq packages for differential expression analysis for RNA Seq data. For example, does it depend upon how the counts were normalized? Thanks Sakshi Sakshi Gulati PhD Student Biomolecular Modelling Laboratory Cancer Research UK London Research Institute 44 Lincoln's Inn Fields London WC2A 3LY NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:19}}

edgeR DESeq edgeR DESeq • 4.3k views

ADD COMMENT • link updated 11.5 years ago by Mark Robinson ▴ 880 • written 11.5 years ago by Sakshi Gulati ▴ 40

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 5.5 years ago

Hi Sakshi, The two packages are indeed fairly similar. They differ in their: i) look-and-feel -- overall the pipelines are quite similar, but things like specifying arbitrary contrasts, offsets, packaging the output statistics, etc. are, IMHO, easier in edgeR. ii) standard normalization (edgeR - TMM; DESeq - what I call "RLE", which is also implemented in edgeR's calcNormFactors) ? these are actually very similar anyways in the situations where I've tested it. iii) dispersion estimation (edgeR default - moderate to trend; DESeq default - take maximum of individual or trend). My impression is that this makes DESeq (slightly?) less powerful and edgeR (slightly?) sensitive to outliers. > I am unsure as to if there is any particular condition that is the deciding factor between whether to use edgeR or DESeq packages for differential expression analysis for RNA Seq data. I prefer edgeR, but there is some pretty strong prejudice behind that :) > For example, does it depend upon how the counts were normalized? I don't understand this question, since both packages expect un- normalized counts. Best, Mark On 07.11.2012, at 13:01, Sakshi Gulati wrote: > Hi > > I am unsure as to if there is any particular condition that is the deciding factor between whether to use edgeR or DESeq packages for differential expression analysis for RNA Seq data. For example, does it depend upon how the counts were normalized? > > Thanks > Sakshi > > > Sakshi Gulati > PhD Student > Biomolecular Modelling Laboratory > Cancer Research UK London Research Institute > 44 Lincoln's Inn Fields > London WC2A 3LY > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:19}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.5 years ago Mark Robinson ▴ 880

0

Entering edit mode

Hi Mark, Thanks for answering. It makes more sense now. I have upper quartile normalised RSEM counts per gene. Is that ok as an input for edgeR and/or DESeq? Thanks Sakshi -----Original Message----- From: Mark Robinson [mailto:mark.robinson@imls.uzh.ch] Sent: 07 November 2012 13:18 To: Sakshi Gulati Cc: bioconductor at r-project.org Subject: Re: [BioC] Is there any prejudice whether to use edgeR or DESeq for differential expression analysis for RNA Seq data Hi Sakshi, The two packages are indeed fairly similar. They differ in their: i) look-and-feel -- overall the pipelines are quite similar, but things like specifying arbitrary contrasts, offsets, packaging the output statistics, etc. are, IMHO, easier in edgeR. ii) standard normalization (edgeR - TMM; DESeq - what I call "RLE", which is also implemented in edgeR's calcNormFactors) ... these are actually very similar anyways in the situations where I've tested it. iii) dispersion estimation (edgeR default - moderate to trend; DESeq default - take maximum of individual or trend). My impression is that this makes DESeq (slightly?) less powerful and edgeR (slightly?) sensitive to outliers. > I am unsure as to if there is any particular condition that is the deciding factor between whether to use edgeR or DESeq packages for differential expression analysis for RNA Seq data. I prefer edgeR, but there is some pretty strong prejudice behind that :) > For example, does it depend upon how the counts were normalized? I don't understand this question, since both packages expect un- normalized counts. Best, Mark On 07.11.2012, at 13:01, Sakshi Gulati wrote: > Hi > > I am unsure as to if there is any particular condition that is the deciding factor between whether to use edgeR or DESeq packages for differential expression analysis for RNA Seq data. For example, does it depend upon how the counts were normalized? > > Thanks > Sakshi > > > Sakshi Gulati > PhD Student > Biomolecular Modelling Laboratory > Cancer Research UK London Research Institute > 44 Lincoln's Inn Fields > London WC2A 3LY > > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:19}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:17}}

ADD REPLY • link 11.5 years ago Sakshi Gulati ▴ 40

0

Entering edit mode

Hi, On Wed, Nov 7, 2012 at 9:42 AM, Sakshi Gulati <sakshi.gulati at="" cancer.org.uk=""> wrote: > Hi Mark, > > Thanks for answering. It makes more sense now. I have upper quartile normalised RSEM counts per gene. Is that ok as an input for edgeR and/or DESeq? My guess is that they are not fine. Not familiar with RSEM, but if these are actually *counts* (first hint that they are not counts if they are not integers) then you are ok. If these are something like (R|F)PKM, then you're not -- ditto if you are already inputting numbers that have been previously scaled to library size. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 11.5 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

RSEM counts are indeed counts (not a normalized/scaled FPKM), but for the relatively small subset of reads that are ambiguously mapped ("multi-mapped"), then the count for that read gets broken up across the (probabilistically-weighted) possibilities. So the final counts are non-integer. We (and others) round these values to integers to usually good effect, since their magnitude remains consistent with the philosophically "true" count (i.e. the mean/dispersion relationship remains similar to what it would have been had we just ignored the multi- mapped reads and just used true counts). -Aaron On Wed, Nov 7, 2012 at 9:58 AM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi, > > On Wed, Nov 7, 2012 at 9:42 AM, Sakshi Gulati > <sakshi.gulati@cancer.org.uk> wrote: > > Hi Mark, > > > > Thanks for answering. It makes more sense now. I have upper quartile > normalised RSEM counts per gene. Is that ok as an input for edgeR and/or > DESeq? > > My guess is that they are not fine. > > Not familiar with RSEM, but if these are actually *counts* (first hint > that they are not counts if they are not integers) then you are ok. If > these are something like (R|F)PKM, then you're not -- ditto if you are > already inputting numbers that have been previously scaled to library > size. > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Aaron Mackey ▴ 170

0

Entering edit mode

Hi Aaron, Yes, I had indeed rounded them up to avoid the non-integer issue as well. But, as the others pointed out, do you think it is an issue for me to use them in packages like edgeR or DESeq? Thanks Sakshi From: ajmackey@gmail.com [mailto:ajmackey@gmail.com] On Behalf Of Aaron Mackey Sent: 07 November 2012 16:07 To: Steve Lianoglou Cc: Sakshi Gulati; bioconductor@r-project.org Subject: Re: [BioC] Is there any prejudice whether to use edgeR or DESeq for differential expression analysis for RNA Seq data RSEM counts are indeed counts (not a normalized/scaled FPKM), but for the relatively small subset of reads that are ambiguously mapped ("multi-mapped"), then the count for that read gets broken up across the (probabilistically-weighted) possibilities. So the final counts are non-integer. We (and others) round these values to integers to usually good effect, since their magnitude remains consistent with the philosophically "true" count (i.e. the mean/dispersion relationship remains similar to what it would have been had we just ignored the multi-mapped reads and just used true counts). -Aaron On Wed, Nov 7, 2012 at 9:58 AM, Steve Lianoglou <mailinglist.honeypot@ gmail.com<mailto:mailinglist.honeypot@gmail.com="">> wrote: Hi, On Wed, Nov 7, 2012 at 9:42 AM, Sakshi Gulati <sakshi.gulati@cancer.org.uk<mailto:sakshi.gulati@cancer.org.uk>> wrote: > Hi Mark, > > Thanks for answering. It makes more sense now. I have upper quartile normalised RSEM counts per gene. Is that ok as an input for edgeR and/or DESeq? My guess is that they are not fine. Not familiar with RSEM, but if these are actually *counts* (first hint that they are not counts if they are not integers) then you are ok. If these are something like (R|F)PKM, then you're not -- ditto if you are already inputting numbers that have been previously scaled to library size. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:19}}

ADD REPLY • link 11.5 years ago Sakshi Gulati ▴ 40

0

Entering edit mode

Hi all, There are some information about performing diff. exp. analysis after aligning with RSEM on the RSEM webpage: http://deweylab.biostat.wisc.edu/rsem/README.html#de. Using EBSeq, a (non yet Bioc as far as I can tell) R package. Depending on how you ran RSEM - see there for RSEM output details: http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html - you'll get TPM or FPKM values that are not suitable for being used with edgeR/DESeq. But an interesting proxy for counts might be the "expected_count" column of the result file, i.e. count corrected for multiple mapping. Comparing the outcome of edgeR/DESeq using these and the result you'd obtain from EBSeq is certainly worth a try. HTH, Nico --------------------------------------------------------------- Nicolas Delhomme Nathaniel Street Lab Department of Plant Physiology Ume? Plant Science Center Tel: +46 90 786 7989 Email: nicolas.delhomme at plantphys.umu.se SLU - Ume? universitet Ume? S-901 87 Sweden --------------------------------------------------------------- On Nov 7, 2012, at 3:58 PM, Steve Lianoglou wrote: > Hi, > > On Wed, Nov 7, 2012 at 9:42 AM, Sakshi Gulati > <sakshi.gulati at="" cancer.org.uk=""> wrote: >> Hi Mark, >> >> Thanks for answering. It makes more sense now. I have upper quartile normalised RSEM counts per gene. Is that ok as an input for edgeR and/or DESeq? > > My guess is that they are not fine. > > Not familiar with RSEM, but if these are actually *counts* (first hint > that they are not counts if they are not integers) then you are ok. If > these are something like (R|F)PKM, then you're not -- ditto if you are > already inputting numbers that have been previously scaled to library > size. > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.5 years ago Nicolas Delhomme ▴ 30

Login before adding your answer.