RSEM for transcript and gene level read count and edgeR differential expression analysis

0

Entering edit mode

Alan Smith ▴ 150

@alan-smith-5987

Last seen 8.7 years ago

United States

Hello, I used RSEM to extract gene and transcript level read count information for our single end read libraries. Then rounded off the expected read counts to use for differential expression analysis using edgeR at both transcript and gene level. However, I found that the number of DE transcripts were almost 10 times less than those of genes. Is this expected or should I be following other package to analyze transcript level DE analysis. Appreciate any help/suggestions. Thank you, Alan sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.10.3 edgeR_3.2.3 limma_3.16.5 loaded via a namespace (and not attached): [1] tools_3.0.1 [[alternative HTML version deleted]]

edgeR edgeR • 7.1k views

ADD COMMENT • link updated 12.4 years ago by Steve Lianoglou ★ 13k • written 12.4 years ago by Alan Smith ▴ 150

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 10 weeks ago

United States

Hi, On Mon, Aug 19, 2013 at 11:17 AM, Alan Smith <alan.sm310 at="" gmail.com=""> wrote: > Hello, > > I used RSEM to extract gene and transcript level read count information for > our single end read libraries. Then rounded off the expected read counts to > use for differential expression analysis using edgeR at both transcript and > gene level. However, I found that the number of DE transcripts were almost > 10 times less than those of genes. > > Is this expected or should I be following other package to analyze > transcript level DE analysis. If you take a minute to take a walk down memory lane and browse through the list archives searching for "RSEM": http://search.gmane.org/?query=rsem&author=&group=gmane.science.biolog y.informatics.conductor&sort=date&DEFAULTOP=and&xP=Zrsem&xFILTERS=Gsci ence.biology.informatics.conductor---A/ You'll find that RSEM output doesn't play well with edgeR and DESeq. These methods explicitly require *count* data -- not any old number that has been then rounded to an integer. I recall a thread about how voom would likely work well with RSEM output, but I can't seem to dig it up right now -- I'm only finding other people mentioning that thread ;-) HTH, -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech

ADD COMMENT • link 12.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

http://arxiv.org/pdf/1301.5277v2.pdf may be of interest voom() transforms everything to log2(RPM) before estimating mean- variance trends, so fractional count estimates from RSEM won't impact the modeling anyhow. See also http://www.statsci.org/smyth/pubs/VoomPreprint.pdf Personally I've wondered for a long, long time why people spend so much time on GLMs for sensitivity, when more biological replicates would make more difference. On Mon, Aug 19, 2013 at 1:44 PM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > Hi, > > On Mon, Aug 19, 2013 at 11:17 AM, Alan Smith <alan.sm310@gmail.com> wrote: > > Hello, > > > > I used RSEM to extract gene and transcript level read count information > for > > our single end read libraries. Then rounded off the expected read counts > to > > use for differential expression analysis using edgeR at both transcript > and > > gene level. However, I found that the number of DE transcripts were > almost > > 10 times less than those of genes. > > > > Is this expected or should I be following other package to analyze > > transcript level DE analysis. > > If you take a minute to take a walk down memory lane and browse > through the list archives searching for "RSEM": > > > http://search.gmane.org/?query=rsem&author=&group=gmane.science.biol ogy.informatics.conductor&sort=date&DEFAULTOP=and&xP=Zrsem&xFILTERS=Gs cience.biology.informatics.conductor---A/ > > You'll find that RSEM output doesn't play well with edgeR and DESeq. > These methods explicitly require *count* data -- not any old number > that has been then rounded to an integer. > > I recall a thread about how voom would likely work well with RSEM > output, but I can't seem to dig it up right now -- I'm only finding > other people mentioning that thread ;-) > > HTH, > -steve > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *He that would live in peace and at ease, * *Must not speak all he knows, nor judge all he sees.* * * Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> [[alternative HTML version deleted]]

ADD REPLY • link 12.4 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Hi, On Mon, Aug 19, 2013 at 2:30 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: > http://arxiv.org/pdf/1301.5277v2.pdf may be of interest > > voom() transforms everything to log2(RPM) before estimating mean- variance > trends, so fractional count estimates from RSEM won't impact the modeling > anyhow. See also > > http://www.statsci.org/smyth/pubs/VoomPreprint.pdf > > Personally I've wondered for a long, long time why people spend so much > time on GLMs for sensitivity, when more biological replicates would make > more difference. Money and time. Time to grow the mouse, I mean ... I wasn't referring to *your* time as a bioinformatician, no one actually cares about that as long as I can have the results by tomorrow, thanks! :-) -steve -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech

ADD REPLY • link 12.4 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Right but if you're injecting 10 mice with 5FU, injecting another 10 isn't really that much marginal effort. I'm sitting at a lab bench right now as it happens :-) On Mon, Aug 19, 2013 at 2:38 PM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > Hi, > > On Mon, Aug 19, 2013 at 2:30 PM, Tim Triche, Jr. <tim.triche@gmail.com> > wrote: > > http://arxiv.org/pdf/1301.5277v2.pdf may be of interest > > > > voom() transforms everything to log2(RPM) before estimating mean- variance > > trends, so fractional count estimates from RSEM won't impact the modeling > > anyhow. See also > > > > http://www.statsci.org/smyth/pubs/VoomPreprint.pdf > > > > Personally I've wondered for a long, long time why people spend so much > > time on GLMs for sensitivity, when more biological replicates would make > > more difference. > > Money and time. > > Time to grow the mouse, I mean ... I wasn't referring to *your* time > as a bioinformatician, no one actually cares about that as long as I > can have the results by tomorrow, thanks! :-) > > -steve > > -- > Steve Lianoglou > Computational Biologist > Bioinformatics and Computational Biology > Genentech > -- *He that would live in peace and at ease, * *Must not speak all he knows, nor judge all he sees.* * * Benjamin Franklin, Poor Richard's Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> [[alternative HTML version deleted]]

ADD REPLY • link 12.4 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Thanks a lot Steve and Tim. I will try Limma-Voom. Will update you once I'm done. Thanks again, -Alan On Mon, Aug 19, 2013 at 4:50 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > Right but if you're injecting 10 mice with 5FU, injecting another 10 isn't > really that much marginal effort. I'm sitting at a lab bench right now as > it happens :-) > > > > On Mon, Aug 19, 2013 at 2:38 PM, Steve Lianoglou > <lianoglou.steve@gene.com>wrote: > > > Hi, > > > > On Mon, Aug 19, 2013 at 2:30 PM, Tim Triche, Jr. <tim.triche@gmail.com> > > wrote: > > > http://arxiv.org/pdf/1301.5277v2.pdf may be of interest > > > > > > voom() transforms everything to log2(RPM) before estimating > mean-variance > > > trends, so fractional count estimates from RSEM won't impact the > modeling > > > anyhow. See also > > > > > > http://www.statsci.org/smyth/pubs/VoomPreprint.pdf > > > > > > Personally I've wondered for a long, long time why people spend so much > > > time on GLMs for sensitivity, when more biological replicates would > make > > > more difference. > > > > Money and time. > > > > Time to grow the mouse, I mean ... I wasn't referring to *your* time > > as a bioinformatician, no one actually cares about that as long as I > > can have the results by tomorrow, thanks! :-) > > > > -steve > > > > -- > > Steve Lianoglou > > Computational Biologist > > Bioinformatics and Computational Biology > > Genentech > > > > > > -- > *He that would live in peace and at ease, * > *Must not speak all he knows, nor judge all he sees.* > * > * > Benjamin Franklin, Poor Richard's > Almanack<http: archive.org="" details="" poorrichardsalma00franrich=""> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 12.4 years ago Alan Smith ▴ 150

0

Entering edit mode

On Mon, Aug 19, 2013 at 4:44 PM, Steve Lianoglou <lianoglou.steve@gene.com>wrote: > You'll find that RSEM output doesn't play well with edgeR and DESeq. > These methods explicitly require *count* data -- not any old number > that has been then rounded to an integer. > I've never actually understood/bought-into this line of reasoning. RSEM values are not "any old number", but are just a special form of counts; if you sum all the values, you get the total mapped read count. If you'd like to split hairs, they are weighted counts that are represented succinctly (perhaps even sufficiently?) by multiplying each count by its weight, rather than supplying the individual raw counts and their weights as separate vectors. I don't think anyone would argue against our numerical ability to include weighted data in any GLM, so why so much fuss? Since edgeR/DESeq/voom/etc do not themselves handle the ambiguity of read mapping (cf. any Cufflinks paper decrying this fact), it seems like RSEM + edgeR/DESeq/voom/etc is your only analytical option for ANOVA-type hypothesis testing that attempts to take read mapping uncertainty into account (albeit without integrating that uncertainty in the model itself, as cuffdiff/eXpress do). This all seems squarely in the realm of "all models are wrong, but some are useful" -- we're just making choices about what kind of "wrongness" we're more willing to tolerate (rounding RSEM weighted counts: perhaps distributional wrongness; taking raw read counts with no uncertainty: certain error/variance wrongness). -Aaron [[alternative HTML version deleted]]

ADD REPLY • link 12.4 years ago Aaron Mackey ▴ 200

Login before adding your answer.