Counting reads for edgeR or voom()
2
0
Entering edit mode
@cittaro-davide-5375
Last seen 10.3 years ago
Hi all, just a quick question on best practices to create datasets for RNA-seq analysis with edgeR or limma:::voom(). I must create a table in which read counts for every entity (i.e. every transcript) for every sample. In order to do this I have at least a couple of options that may affect results: 1- should I count all reads overlapping the whole transcript or the reads that overlap exons? 2- should I use every transcript or just the primary one? Thanks d /* Davide Cittaro, PhD Coordinator of Bioinformatics Core Center for Translational Genomics and Bioinformatics Ospedale San Raffaele Via Olgettina 58 20132 Milano Italy Office: +39 02 26439140 Mail: cittaro.davide at hsr.it Skype: daweonline */ ---------------------------------------------------------------------- ---- LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO REALIZZANDO. AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 03 06 80 153 info:www.5xmille at hsr.it - www.5xmille.org Disclaimer added by CodeTwo Exchange Rules 2007 http://www.codetwo.com
edgeR edgeR • 1.2k views
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 18 months ago
United States
There are many ways of doing the counting, including variants you have not listed. To my knowledge, no-one has ever really investigated the "right' way to this. There are many recommendations around (including mine), but no-one has really investigated this well. But it can (does) matter. Kasper On Tue, Jul 3, 2012 at 8:06 AM, Cittaro Davide <cittaro.davide at="" hsr.it=""> wrote: > Hi all, just a quick question on best practices to create datasets for RNA-seq analysis with edgeR or limma:::voom(). > I must create a table in which read counts for every entity (i.e. every transcript) for every sample. In order to do this I have at least a couple of options that may affect results: > 1- should I count all reads overlapping the whole transcript or the reads that overlap exons? > 2- should I use every transcript or just the primary one? > > Thanks > > d > > /* > Davide Cittaro, PhD > > Coordinator of Bioinformatics Core > Center for Translational Genomics and Bioinformatics > Ospedale San Raffaele > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439140 > Mail: cittaro.davide at hsr.it > Skype: daweonline > */ > > > > > > > > > > > > > -------------------------------------------------------------------- ------ > LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO REALIZZANDO. > AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 03 06 80 153 > info:www.5xmille at hsr.it - www.5xmille.org > > Disclaimer added by CodeTwo Exchange Rules 2007 > http://www.codetwo.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
didn't the BowTie guys play around with various methods and settle on splitting up multimapping reads between the supportable transcripts? I could have sworn they offered some empirical evidence to justify the choice. now I am going to have to go and look for where I saw this... %^$@$#%^ On Tue, Jul 3, 2012 at 5:28 AM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > There are many ways of doing the counting, including variants you have > not listed. > > To my knowledge, no-one has ever really investigated the "right' way > to this. There are many recommendations around (including mine), but > no-one has really investigated this well. But it can (does) matter. > > Kasper > > On Tue, Jul 3, 2012 at 8:06 AM, Cittaro Davide <cittaro.davide@hsr.it> > wrote: > > Hi all, just a quick question on best practices to create datasets for > RNA-seq analysis with edgeR or limma:::voom(). > > I must create a table in which read counts for every entity (i.e. every > transcript) for every sample. In order to do this I have at least a couple > of options that may affect results: > > 1- should I count all reads overlapping the whole transcript or the > reads that overlap exons? > > 2- should I use every transcript or just the primary one? > > > > Thanks > > > > d > > > > /* > > Davide Cittaro, PhD > > > > Coordinator of Bioinformatics Core > > Center for Translational Genomics and Bioinformatics > > Ospedale San Raffaele > > Via Olgettina 58 > > 20132 Milano > > Italy > > > > Office: +39 02 26439140 > > Mail: cittaro.davide@hsr.it > > Skype: daweonline > > */ > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------- ------ > > LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO > REALIZZANDO. > > AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 03 06 80 153 > > info:www.5xmille@hsr.it - www.5xmille.org > > > > Disclaimer added by CodeTwo Exchange Rules 2007 > > http://www.codetwo.com > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 1 day ago
Australia/Melbourne
Hi Davide, >From my point of view, counting reads overlapping exons is easier than counting reads overlapping with transcripts because exons are much better characterized than transcripts. The exon annotation is easy to obtain, for example the RefSeq annotation (or annotations from Bioc packages). Another problem with summarizing reads to transcripts is that transcripts are more likely to overlap with each other compared to exons, which leads to the issue of which transcript a read should be assigned to if it falls within the common region. This is an issue for summarizing reads to exons as well, but it is not as serious as that for transcripts. Hope this helps. Cheers, Wei On Jul 3, 2012, at 10:06 PM, Cittaro Davide wrote: > Hi all, just a quick question on best practices to create datasets for RNA-seq analysis with edgeR or limma:::voom(). > I must create a table in which read counts for every entity (i.e. every transcript) for every sample. In order to do this I have at least a couple of options that may affect results: > 1- should I count all reads overlapping the whole transcript or the reads that overlap exons? > 2- should I use every transcript or just the primary one? > > Thanks > > d > > /* > Davide Cittaro, PhD > > Coordinator of Bioinformatics Core > Center for Translational Genomics and Bioinformatics > Ospedale San Raffaele > Via Olgettina 58 > 20132 Milano > Italy > > Office: +39 02 26439140 > Mail: cittaro.davide at hsr.it > Skype: daweonline > */ > > > > > > > > > > > > > -------------------------------------------------------------------- ------ > LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO REALIZZANDO. > AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 03 06 80 153 > info:www.5xmille at hsr.it - www.5xmille.org > > Disclaimer added by CodeTwo Exchange Rules 2007 > http://www.codetwo.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT

Login before adding your answer.

Traffic: 875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6