DESeq on transcripts v/s genes

1

Entering edit mode

Abhishek Pratap ▴ 410

@abhishek-pratap-5083

Last seen 9.6 years ago

Hi All I am wondering if conceptually I can use the DESeq to test for differential transcript expression compared to genes. In our case we have generated a transcript model based on RNA-Seq and if we try to collapse those transcripts to genes in order to do gene level differential expression many exons are collapsed to give rise to artificial exons. eg : Transcript 1 : ---------------------- (exon) Transcript 2 : -----------------------------(exon ) Gene level : -------------------------------------------- (exon) Also another thing that comes to my mind if the effect of double counting if I take the read counts at transcript level due to exon redundancy. I would love to hear from your experience. Thanks! -Abhi [[alternative HTML version deleted]]

DESeq DESeq • 2.4k views

ADD COMMENT • link updated 12.2 years ago by Wolfgang Huber ★ 13k • written 12.2 years ago by Abhishek Pratap ▴ 410

1

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 11 days ago

EMBL European Molecular Biology Laborat…

Dear Abishek there was some anxiety regarding double-counting / redundancy in this thread. Actually, there is very little reason to worry. DESeq tests sequentially one hypothesis after the other. It does not matter whether they are correlated or not. The one consideration where the correlations / redundancy can matter is multiple testing correction. As long as you go for FDR, again there is little to worry, since the redundancy pops up both in the numerator and denominator of the ratio (the "R" in FDR) and at least to good enough approximation cancels out. If you go for family-wise error rate (FWER) and, say, Bonferroni correction, then the redundancy and the increase in number of tests do matter. But there seem few reasons to use FWER/Bonferroni in this context. Hope this helps Wolfgang Feb/2/12 12:46 AM, Abhishek Pratap scripsit:: > Hi All > > I am wondering if conceptually I can use the DESeq to test for differential > transcript expression compared to genes. In our case we have generated a > transcript model based on RNA-Seq and if we try to collapse those > transcripts to genes in order to do gene level differential expression many > exons are collapsed to give rise to artificial exons. > > > eg : > > > Transcript 1 : ---------------------- (exon) > Transcript 2 : -----------------------------(exon ) > > Gene level : -------------------------------------------- (exon) > > Also another thing that comes to my mind if the effect of double counting > if I take the read counts at transcript level due to exon redundancy. > > I would love to hear from your experience. > > Thanks! > -Abhi > > [[alternative HTML version deleted]] > Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 12.2 years ago Wolfgang Huber ★ 13k

1

Entering edit mode

A clarification (after off-list request): there are two possibilties for double counting, and with below post I'm refering to only one of them: 1. Creating a transcript-level count for each possible transcript of a gene, essentially by *treating each transcript as a separate 'gene'*, and then calling DESeq or analgous. This is what the below post refers to. 2. Counting the reads touching each exon, and then *summing these numbers up over all exons of a gene* to get a per-gene (or per transcript) value. That would be wrong, since then those reads that touch more than one exon are multiply counted and mess up the statistical model. Best wishes Wolfgang Feb/5/12 12:16 PM, Wolfgang Huber scripsit:: > Dear Abishek > > there was some anxiety regarding double-counting / redundancy in this > thread. Actually, there is very little reason to worry. DESeq tests > sequentially one hypothesis after the other. It does not matter whether > they are correlated or not. > > The one consideration where the correlations / redundancy can matter is > multiple testing correction. As long as you go for FDR, again there is > little to worry, since the redundancy pops up both in the numerator and > denominator of the ratio (the "R" in FDR) and at least to good enough > approximation cancels out. > > If you go for family-wise error rate (FWER) and, say, Bonferroni > correction, then the redundancy and the increase in number of tests do > matter. But there seem few reasons to use FWER/Bonferroni in this context. > > Hope this helps > Wolfgang > > Feb/2/12 12:46 AM, Abhishek Pratap scripsit:: >> Hi All >> >> I am wondering if conceptually I can use the DESeq to test for >> differential >> transcript expression compared to genes. In our case we have generated a >> transcript model based on RNA-Seq and if we try to collapse those >> transcripts to genes in order to do gene level differential expression >> many >> exons are collapsed to give rise to artificial exons. >> >> >> eg : >> >> >> Transcript 1 : ---------------------- (exon) >> Transcript 2 : -----------------------------(exon ) >> >> Gene level : -------------------------------------------- (exon) >> >> Also another thing that comes to my mind if the effect of double counting >> if I take the read counts at transcript level due to exon redundancy. >> >> I would love to hear from your experience. >> >> Thanks! >> -Abhi >> >> [[alternative HTML version deleted]] >> > > Best wishes > Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 12.2 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Thanks a lot for the clarifications on a weekend. Sorry I could not get back earlier. It seems like what I am trying to do should work out and not introduce and significant biases. Cheers! -Abhi On Sun, Feb 5, 2012 at 6:59 AM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > A clarification (after off-list request): there are two possibilties for > double counting, and with below post I'm refering to only one of them: > > 1. Creating a transcript-level count for each possible transcript of a gene, > essentially by *treating each transcript as a separate 'gene'*, and then > calling DESeq or analgous. This is what the below post refers to. > > 2. Counting the reads touching each exon, and then *summing these numbers up > over all exons of a gene* to get a per-gene (or per transcript) value. That > would be wrong, since then those reads that touch more than one exon are > multiply counted and mess up the statistical model. > > ? ? ? ?Best wishes > ? ? ? ?Wolfgang > > Feb/5/12 12:16 PM, Wolfgang Huber scripsit:: > >> Dear Abishek >> >> there was some anxiety regarding double-counting / redundancy in this >> thread. Actually, there is very little reason to worry. DESeq tests >> sequentially one hypothesis after the other. It does not matter whether >> they are correlated or not. >> >> The one consideration where the correlations / redundancy can matter is >> multiple testing correction. As long as you go for FDR, again there is >> little to worry, since the redundancy pops up both in the numerator and >> denominator of the ratio (the "R" in FDR) and at least to good enough >> approximation cancels out. >> >> If you go for family-wise error rate (FWER) and, say, Bonferroni >> correction, then the redundancy and the increase in number of tests do >> matter. But there seem few reasons to use FWER/Bonferroni in this context. >> >> Hope this helps >> Wolfgang >> >> Feb/2/12 12:46 AM, Abhishek Pratap scripsit:: >>> >>> Hi All >>> >>> I am wondering if conceptually I can use the DESeq to test for >>> differential >>> transcript expression compared to genes. In our case we have generated a >>> transcript model based on RNA-Seq and if we try to collapse those >>> transcripts to genes in order to do gene level differential expression >>> many >>> exons are collapsed to give rise to artificial exons. >>> >>> >>> eg : >>> >>> >>> Transcript 1 : ---------------------- (exon) >>> Transcript 2 : -----------------------------(exon ) >>> >>> Gene level : -------------------------------------------- (exon) >>> >>> Also another thing that comes to my mind if the effect of double counting >>> if I take the read counts at transcript level due to exon redundancy. >>> >>> I would love to hear from your experience. >>> >>> Thanks! >>> -Abhi >>> >>> [[alternative HTML version deleted]] >>> >> >> Best wishes >> Wolfgang >> >> Wolfgang Huber >> EMBL >> http://www.embl.de/research/units/genome_biology/huber >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Best wishes > ? ? ? ?Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.2 years ago Abhishek Pratap ▴ 410

0

Entering edit mode

Abhishek Pratap ▴ 410

@abhishek-pratap-5083

Last seen 9.6 years ago

Hi Peter Thanks for your quick reply. I will try to play with RSEM but for our current analysis we would also like to use DESeq and may be then compare results with RSEM. Just wondering if you know the method difference between RSEM and DEGSeq. -Abhi On Wed, Feb 1, 2012 at 3:57 PM, wang peter <wng.peter@gmail.com> wrote: > u can use RSEM > http://trinityrnaseq.sourceforge.net/analysis/align_visualize_quanti fy.html > > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839@cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253 > [[alternative HTML version deleted]]

ADD COMMENT • link 12.2 years ago Abhishek Pratap ▴ 410

0

Entering edit mode

wang peter ★ 2.0k

@wang-peter-4647

Last seen 9.6 years ago

---------- Forwarded message ---------- From: wang peter <wng.peter@gmail.com> Date: Wed, Feb 1, 2012 at 6:57 PM Subject: Re: [BioC] DESeq on transcripts v/s genes To: Abhishek Pratap <apratap at="" lbl.gov=""> u can use RSEM http://trinityrnaseq.sourceforge.net/analysis/align_visualize_quantify .html -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253

ADD COMMENT • link 12.2 years ago wang peter ★ 2.0k

Login before adding your answer.