Counting reads for edgeR or voom()

0

Entering edit mode

Cittaro Davide ▴ 240

@cittaro-davide-5375

Last seen 11.4 years ago

Hi all, just a quick question on best practices to create datasets for RNA-seq analysis with edgeR or limma:::voom(). I must create a table in which read counts for every entity (i.e. every transcript) for every sample. In order to do this I have at least a couple of options that may affect results: 1- should I count all reads overlapping the whole transcript or the reads that overlap exons? 2- should I use every transcript or just the primary one? Thanks d /* Davide Cittaro, PhD Coordinator of Bioinformatics Core Center for Translational Genomics and Bioinformatics Ospedale San Raffaele Via Olgettina 58 20132 Milano Italy Office: +39 02 26439140 Mail: cittaro.davide at hsr.it Skype: daweonline */ ---------------------------------------------------------------------- ---- LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO REALIZZANDO. AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 03 06 80 153 info:www.5xmille at hsr.it - www.5xmille.org Disclaimer added by CodeTwo Exchange Rules 2007 http://www.codetwo.com

edgeR edgeR • 1.5k views

ADD COMMENT • link updated 13.5 years ago by Wei Shi ★ 3.6k • written 13.5 years ago by Cittaro Davide ▴ 240

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 2.6 years ago

United States

There are many ways of doing the counting, including variants you have not listed. To my knowledge, no-one has ever really investigated the "right' way to this. There are many recommendations around (including mine), but no-one has really investigated this well. But it can (does) matter. Kasper On Tue, Jul 3, 2012 at 8:06 AM, Cittaro Davide <cittaro.davide at="" hsr.it=""> wrote: > Hi all, just a quick question on best practices to create datasets for RNA-seq analysis with edgeR or limma:::voom(). > I must create a table in which read counts for every entity (i.e. every transcript) for every sample. In order to do this I have at least a couple of options that may affect results: > 1- should I count all reads overlapping the whole transcript or the reads that overlap exons? > 2- should I use every transcript or just the primary one? > > Thanks > > d > > /* > Davide Cittaro, PhD > > Coordinator of Bioinformatics Core > Center for Translational Genomics and Bioinformatics > Ospedale San Raffaele > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439140 > Mail: cittaro.davide at hsr.it > Skype: daweonline > */ > > > > > > > > > > > > > -------------------------------------------------------------------- ------ > LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO REALIZZANDO. > AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 03 06 80 153 > info:www.5xmille at hsr.it - www.5xmille.org > > Disclaimer added by CodeTwo Exchange Rules 2007 > http://www.codetwo.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 13.5 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

didn't the BowTie guys play around with various methods and settle on splitting up multimapping reads between the supportable transcripts? I could have sworn they offered some empirical evidence to justify the choice. now I am going to have to go and look for where I saw this... %^$@$#%^ On Tue, Jul 3, 2012 at 5:28 AM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > There are many ways of doing the counting, including variants you have > not listed. > > To my knowledge, no-one has ever really investigated the "right' way > to this. There are many recommendations around (including mine), but > no-one has really investigated this well. But it can (does) matter. > > Kasper > > On Tue, Jul 3, 2012 at 8:06 AM, Cittaro Davide <cittaro.davide@hsr.it> > wrote: > > Hi all, just a quick question on best practices to create datasets for > RNA-seq analysis with edgeR or limma:::voom(). > > I must create a table in which read counts for every entity (i.e. every > transcript) for every sample. In order to do this I have at least a couple > of options that may affect results: > > 1- should I count all reads overlapping the whole transcript or the > reads that overlap exons? > > 2- should I use every transcript or just the primary one? > > > > Thanks > > > > d > > > > /* > > Davide Cittaro, PhD > > > > Coordinator of Bioinformatics Core > > Center for Translational Genomics and Bioinformatics > > Ospedale San Raffaele > > Via Olgettina 58 > > 20132 Milano > > Italy > > > > Office: +39 02 26439140 > > Mail: cittaro.davide@hsr.it > > Skype: daweonline > > */ > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------- ------ > > LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO > REALIZZANDO. > > AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 03 06 80 153 > > info:www.5xmille@hsr.it - www.5xmille.org > > > > Disclaimer added by CodeTwo Exchange Rules 2007 > > http://www.codetwo.com > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 13.5 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 3 days ago

Australia/Melbourne

Hi Davide, >From my point of view, counting reads overlapping exons is easier than counting reads overlapping with transcripts because exons are much better characterized than transcripts. The exon annotation is easy to obtain, for example the RefSeq annotation (or annotations from Bioc packages). Another problem with summarizing reads to transcripts is that transcripts are more likely to overlap with each other compared to exons, which leads to the issue of which transcript a read should be assigned to if it falls within the common region. This is an issue for summarizing reads to exons as well, but it is not as serious as that for transcripts. Hope this helps. Cheers, Wei On Jul 3, 2012, at 10:06 PM, Cittaro Davide wrote: > Hi all, just a quick question on best practices to create datasets for RNA-seq analysis with edgeR or limma:::voom(). > I must create a table in which read counts for every entity (i.e. every transcript) for every sample. In order to do this I have at least a couple of options that may affect results: > 1- should I count all reads overlapping the whole transcript or the reads that overlap exons? > 2- should I use every transcript or just the primary one? > > Thanks > > d > > /* > Davide Cittaro, PhD > > Coordinator of Bioinformatics Core > Center for Translational Genomics and Bioinformatics > Ospedale San Raffaele > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439140 > Mail: cittaro.davide at hsr.it > Skype: daweonline > */ > > > > > > > > > > > > > -------------------------------------------------------------------- ------ > LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO REALIZZANDO. > AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 03 06 80 153 > info:www.5xmille at hsr.it - www.5xmille.org > > Disclaimer added by CodeTwo Exchange Rules 2007 > http://www.codetwo.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 13.5 years ago Wei Shi ★ 3.6k

Login before adding your answer.