DESeq and transcript-wise analysis

0

Entering edit mode

Elena Sorokin ▴ 150

@elena-sorokin-4659

Last seen 9.6 years ago

Greetings all, After re-reading related posts in the listserv archive, I still didn't know the exact answer to my question, so here goes. I'd like to use DESeq to measure differential isoform expression. Has Simon or anybody else written a script that will convert aligned reads (.bam/.sam file) into a table of isoform counts, suitable for input to DESEq - similar to what Simon has done at the gene-wise level, but instead for making a table of counts by isoform? I would try to do this myself, but I'm a novice at programming. Sorry if this has been answered elsewhere... If so, please let me know the link. Thanks, Elena

convert DESeq convert DESeq • 3.6k views

ADD COMMENT • link 12.2 years ago Elena Sorokin ▴ 150

0

Entering edit mode

Abhishek Pratap ▴ 410

@abhishek-pratap-5083

Last seen 9.6 years ago

Hi Elena Good timing with me on this. I recently was contemplating the best way to move forward for a similar analysis. HTSeq a python based toolkit by Simon can help you do the counting. FYI : It can also take strand info into account. If you dont have stranded data you could also look at easyrnaseq package. So if you have an annotation file like gff/gtf with the isoform information you could then do the read counting at isoform or gene level based on which attribute of the gff file you select to do the counting. Check out http://www-huber.embl.de/users/anders/HTSeq/doc/count.html. Also you want to keep in mind that at isoform level you would be double counting the reads in exons which are shared in the isoforms which can bias your results to some extent. But as Wolfgang pointed out in a recent post if you use FDR, it should not matter a lost as the bias will be cancelled between denominator /numerator. You also might want to check the DEXSeq which can help infer differential expression from RNA-Seq exons which could then be related back to genes/isoforms. Hope this helps and let us know about your progress. I would be interested in learning from your experience too. Cheers! -Abhi ---------------------------------- Abhishek Pratap Bioinformatics Systems Analyst - 3 DOE- Joint Genome Institute Lawrence Berkeley National Lab On Wed, Feb 8, 2012 at 3:26 PM, Elena Sorokin <sorokin at="" wisc.edu=""> wrote: > Greetings all, > > After re-reading related posts in the listserv archive, I still didn't know > the exact answer to my question, so here goes. I'd like to use DESeq to > measure differential isoform expression. Has Simon or anybody else written a > script that will convert aligned reads (.bam/.sam file) into a table of > isoform counts, suitable for input to DESEq - similar to what Simon has done > at the gene-wise level, but instead for making a table of counts by isoform? > > I would try to do this myself, but I'm a novice at programming. Sorry if > this has been answered elsewhere... If so, please let me know the link. > > Thanks, > Elena > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.2 years ago Abhishek Pratap ▴ 410

0

Entering edit mode

On 02/08/2012 03:41 PM, Abhishek Pratap wrote: > Hi Elena > > Good timing with me on this. I recently was contemplating the best way > to move forward for a similar analysis. HTSeq a python based toolkit > by Simon can help you do the counting. FYI : It can also take strand > info into account. If you dont have stranded data you could also look > at easyrnaseq package. > > So if you have an annotation file like gff/gtf with the isoform > information you could then do the read counting at isoform or gene > level based on which attribute of the gff file you select to do the > counting. Check out > http://www-huber.embl.de/users/anders/HTSeq/doc/count.html. Also GenomicRanges::summarizeOverlaps for HTSeq-like counting; annotations might come from knownGenes (e.g., the TxDb.* annotation packages, or makeTranscriptDbFrom* in GenomicFeatures) or gff / friends via rtracklayer. Martin > > Also you want to keep in mind that at isoform level you would be > double counting the reads in exons which are shared in the isoforms > which can bias your results to some extent. But as Wolfgang pointed > out in a recent post if you use FDR, it should not matter a lost as > the bias will be cancelled between denominator /numerator. > > You also might want to check the DEXSeq which can help infer > differential expression from RNA-Seq exons which could then be related > back to genes/isoforms. > > Hope this helps and let us know about your progress. I would be > interested in learning from your experience too. > > Cheers! > -Abhi > > ---------------------------------- > Abhishek Pratap > Bioinformatics Systems Analyst - 3 > DOE- Joint Genome Institute > Lawrence Berkeley National Lab > > > > > On Wed, Feb 8, 2012 at 3:26 PM, Elena Sorokin<sorokin at="" wisc.edu=""> wrote: >> Greetings all, >> >> After re-reading related posts in the listserv archive, I still didn't know >> the exact answer to my question, so here goes. I'd like to use DESeq to >> measure differential isoform expression. Has Simon or anybody else written a >> script that will convert aligned reads (.bam/.sam file) into a table of >> isoform counts, suitable for input to DESEq - similar to what Simon has done >> at the gene-wise level, but instead for making a table of counts by isoform? >> >> I would try to do this myself, but I'm a novice at programming. Sorry if >> this has been answered elsewhere... If so, please let me know the link. >> >> Thanks, >> Elena >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD REPLY • link 12.2 years ago Martin Morgan 25k

0

Entering edit mode

Dear Abhi, If you could point me to some published strand specific data or let me get an excerpt of yours, I could easily had strand-specificity in the easyRNASeq package. Thanks, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 9 Feb 2012, at 00:41, Abhishek Pratap wrote: > Hi Elena > > Good timing with me on this. I recently was contemplating the best way > to move forward for a similar analysis. HTSeq a python based toolkit > by Simon can help you do the counting. FYI : It can also take strand > info into account. If you dont have stranded data you could also look > at easyrnaseq package. > > So if you have an annotation file like gff/gtf with the isoform > information you could then do the read counting at isoform or gene > level based on which attribute of the gff file you select to do the > counting. Check out > http://www-huber.embl.de/users/anders/HTSeq/doc/count.html. > > Also you want to keep in mind that at isoform level you would be > double counting the reads in exons which are shared in the isoforms > which can bias your results to some extent. But as Wolfgang pointed > out in a recent post if you use FDR, it should not matter a lost as > the bias will be cancelled between denominator /numerator. > > You also might want to check the DEXSeq which can help infer > differential expression from RNA-Seq exons which could then be related > back to genes/isoforms. > > Hope this helps and let us know about your progress. I would be > interested in learning from your experience too. > > Cheers! > -Abhi > > ---------------------------------- > Abhishek Pratap > Bioinformatics Systems Analyst - 3 > DOE- Joint Genome Institute > Lawrence Berkeley National Lab > > > > > On Wed, Feb 8, 2012 at 3:26 PM, Elena Sorokin <sorokin at="" wisc.edu=""> wrote: >> Greetings all, >> >> After re-reading related posts in the listserv archive, I still didn't know >> the exact answer to my question, so here goes. I'd like to use DESeq to >> measure differential isoform expression. Has Simon or anybody else written a >> script that will convert aligned reads (.bam/.sam file) into a table of >> isoform counts, suitable for input to DESEq - similar to what Simon has done >> at the gene-wise level, but instead for making a table of counts by isoform? >> >> I would try to do this myself, but I'm a novice at programming. Sorry if >> this has been answered elsewhere... If so, please let me know the link. >> >> Thanks, >> Elena >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.2 years ago delhomme@embl.de ★ 1.2k

0

Entering edit mode

Thomas Girke ★ 1.7k

@thomas-girke-993

Last seen 27 days ago

United States

This study contains some strand specific RNA-Seq data: http://www.hubmed.org/display.cgi?uids=18423832 I would expect that most RNA-Seq experiments in the near future may be performed in a strand-specific manner, since the strand information carries a lot of biologically relevant information in this application domain. Thus, adding analysis support for it is definitely not a waste of time. I have not used easyRNASeq yet, but I will certainly give it a try. In my group we currently use for RNA-Seq analysis the following components: Rsubread (or tophat) -> rtracklayer/Rsamtools/GenomicRanges -> DESeq/edgeR. This allows any type strand and non-strand specific read counts for exons, transcripts, genes, intergenic features, etc. A huge advantage of this environment is its flexibility and broad application spectrum for most applications domains in the NGS field, such as SNP-Seq, ChiP-Seq, smallRNA-Seq, etc. For instance, our ChIP-Seq analysis routines use most of these tools plus some peak callers. Thomas On Thu, Feb 09, 2012 at 01:38:02PM +0000, Nicolas Delhomme wrote: > Dear Abhi, > > If you could point me to some published strand specific data or let me get an excerpt of yours, I could easily had strand-specificity in the easyRNASeq package. > > Thanks, > > Nico > > --------------------------------------------------------------- > Nicolas Delhomme > > Genome Biology Computational Support > > European Molecular Biology Laboratory > > Tel: +49 6221 387 8310 > Email: nicolas.delhomme at embl.de > Meyerhofstrasse 1 - Postfach 10.2209 > 69102 Heidelberg, Germany > --------------------------------------------------------------- > > > > > > On 9 Feb 2012, at 00:41, Abhishek Pratap wrote: > > > Hi Elena > > > > Good timing with me on this. I recently was contemplating the best way > > to move forward for a similar analysis. HTSeq a python based toolkit > > by Simon can help you do the counting. FYI : It can also take strand > > info into account. If you dont have stranded data you could also look > > at easyrnaseq package. > > > > So if you have an annotation file like gff/gtf with the isoform > > information you could then do the read counting at isoform or gene > > level based on which attribute of the gff file you select to do the > > counting. Check out > > http://www-huber.embl.de/users/anders/HTSeq/doc/count.html. > > > > Also you want to keep in mind that at isoform level you would be > > double counting the reads in exons which are shared in the isoforms > > which can bias your results to some extent. But as Wolfgang pointed > > out in a recent post if you use FDR, it should not matter a lost as > > the bias will be cancelled between denominator /numerator. > > > > You also might want to check the DEXSeq which can help infer > > differential expression from RNA-Seq exons which could then be related > > back to genes/isoforms. > > > > Hope this helps and let us know about your progress. I would be > > interested in learning from your experience too. > > > > Cheers! > > -Abhi > > > > ---------------------------------- > > Abhishek Pratap > > Bioinformatics Systems Analyst - 3 > > DOE- Joint Genome Institute > > Lawrence Berkeley National Lab > > > > > > > > > > On Wed, Feb 8, 2012 at 3:26 PM, Elena Sorokin <sorokin at="" wisc.edu=""> wrote: > >> Greetings all, > >> > >> After re-reading related posts in the listserv archive, I still didn't know > >> the exact answer to my question, so here goes. I'd like to use DESeq to > >> measure differential isoform expression. Has Simon or anybody else written a > >> script that will convert aligned reads (.bam/.sam file) into a table of > >> isoform counts, suitable for input to DESEq - similar to what Simon has done > >> at the gene-wise level, but instead for making a table of counts by isoform? > >> > >> I would try to do this myself, but I'm a novice at programming. Sorry if > >> this has been answered elsewhere... If so, please let me know the link. > >> > >> Thanks, > >> Elena > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.2 years ago Thomas Girke ★ 1.7k

0

Entering edit mode

Hi Thomas, On 9 Feb 2012, at 17:21, Thomas Girke wrote: > This study contains some strand specific RNA-Seq data: > http://www.hubmed.org/display.cgi?uids=18423832 > Thanks for the pointer. > I would expect that most RNA-Seq experiments in the near future may be > performed in a strand-specific manner, since the strand information > carries a lot of biologically relevant information in this application > domain. Thus, adding analysis support for it is definitely not a waste > of time. Clearly. I would have already done had I had the time. > > I have not used easyRNASeq yet, but I will certainly give it a try. Let me know when you do. > > In my group we currently use for RNA-Seq analysis the following > components: Rsubread (or tophat) -> rtracklayer/Rsamtools/GenomicRanges > -> DESeq/edgeR. This allows any type strand and non-strand specific read > counts for exons, transcripts, genes, intergenic features, etc. A huge > advantage of this environment is its flexibility and broad application > spectrum for most applications domains in the NGS field, such as > SNP-Seq, ChiP-Seq, smallRNA-Seq, etc. For instance, our ChIP-Seq > analysis routines use most of these tools plus some peak callers. Exactly why I have been developing and using easyRNASeq as well. Cheers, Nico > > Thomas > > On Thu, Feb 09, 2012 at 01:38:02PM +0000, Nicolas Delhomme wrote: >> Dear Abhi, >> >> If you could point me to some published strand specific data or let me get an excerpt of yours, I could easily had strand-specificity in the easyRNASeq package. >> >> Thanks, >> >> Nico >> >> --------------------------------------------------------------- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 >> Email: nicolas.delhomme at embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 >> 69102 Heidelberg, Germany >> --------------------------------------------------------------- >> >> >> >> >> >> On 9 Feb 2012, at 00:41, Abhishek Pratap wrote: >> >>> Hi Elena >>> >>> Good timing with me on this. I recently was contemplating the best way >>> to move forward for a similar analysis. HTSeq a python based toolkit >>> by Simon can help you do the counting. FYI : It can also take strand >>> info into account. If you dont have stranded data you could also look >>> at easyrnaseq package. >>> >>> So if you have an annotation file like gff/gtf with the isoform >>> information you could then do the read counting at isoform or gene >>> level based on which attribute of the gff file you select to do the >>> counting. Check out >>> http://www-huber.embl.de/users/anders/HTSeq/doc/count.html. >>> >>> Also you want to keep in mind that at isoform level you would be >>> double counting the reads in exons which are shared in the isoforms >>> which can bias your results to some extent. But as Wolfgang pointed >>> out in a recent post if you use FDR, it should not matter a lost as >>> the bias will be cancelled between denominator /numerator. >>> >>> You also might want to check the DEXSeq which can help infer >>> differential expression from RNA-Seq exons which could then be related >>> back to genes/isoforms. >>> >>> Hope this helps and let us know about your progress. I would be >>> interested in learning from your experience too. >>> >>> Cheers! >>> -Abhi >>> >>> ---------------------------------- >>> Abhishek Pratap >>> Bioinformatics Systems Analyst - 3 >>> DOE- Joint Genome Institute >>> Lawrence Berkeley National Lab >>> >>> >>> >>> >>> On Wed, Feb 8, 2012 at 3:26 PM, Elena Sorokin <sorokin at="" wisc.edu=""> wrote: >>>> Greetings all, >>>> >>>> After re-reading related posts in the listserv archive, I still didn't know >>>> the exact answer to my question, so here goes. I'd like to use DESeq to >>>> measure differential isoform expression. Has Simon or anybody else written a >>>> script that will convert aligned reads (.bam/.sam file) into a table of >>>> isoform counts, suitable for input to DESEq - similar to what Simon has done >>>> at the gene-wise level, but instead for making a table of counts by isoform? >>>> >>>> I would try to do this myself, but I'm a novice at programming. Sorry if >>>> this has been answered elsewhere... If so, please let me know the link. >>>> >>>> Thanks, >>>> Elena >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.2 years ago delhomme@embl.de ★ 1.2k

0

Entering edit mode

Simon Anders ★ 3.7k

@simon-anders-3855

Last seen 3.7 years ago

Zentrum für Molekularbiologie, Universi…

Hi On 2012-02-09 00:26, Elena Sorokin wrote: > After re-reading related posts in the listserv archive, I still didn't > know the exact answer to my question, so here goes. I'd like to use > DESeq to measure differential isoform expression. Has Simon or anybody > else written a script that will convert aligned reads (.bam/.sam file) > into a table of isoform counts, suitable for input to DESEq - similar to > what Simon has done at the gene-wise level, but instead for making a > table of counts by isoform? I get asked about every other week how to make such an isoform count table and, frankly, I still don't understand why anybody would want such a table. A typical gene in a vertebrate genome has many isoform which all have large overlap with each other. Hence, if only one of the gene's several isoforms is differentially expressed, the extra reads from it will increase not only the count for this isoform but also for all other isoforms that have some overlap with it. Most likely, this means that _all_ isoforms have an increased count, because there is hardly ever two isoforms that do not overlap somehow. Hence, such a count table is not at all suitable to quantify isoform expression or to detect differential isoform usage. I know this is obvious but I am so puzzled why so many people want to make such tables that I thought I should point it out again. Quantifying isoform expression is a hard problem, and the various approaches discussed in the literature have a hard time to give quantifications which are precise enough to support tests for differential expression of isoforms. This is why we decided to take a whole step back and instead worked on a tool to test for differential usage of individual exons. We think that this approach, now available in the Bioconductor package "DEXSeq", is a more productive way to study alternative isoform regulation than looking at isoforms directly. By the way, we now made a preprint of our manuscript on DEXSeq available: http://precedings.nature.com/documents/6837 Simon

ADD COMMENT • link 12.2 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Elena Sorokin ▴ 150

@elena-sorokin-4659

Last seen 9.6 years ago

Dear Simon, Abhi, and Martin, Thanks for your replies. I just learned about DEXseq and am working through the vignette. Importantly for me, I see that DEXseq can also do generalized linear models. I will compare that to my current isoform analysis in DESeq with its GLM function, then compare both those to previous results from Cuffdiff, where unfortunately I've only been able to do 2-sample comparisons. Simon, I'm working with C. elegans, not with a vertebrate, and am usually dealing with 1-2 isoforms per gene, though sometimes it can be much more. I don't know if that still presents a huge problem for DESeq, but I figure it's worth trying anyway. Thanks again for everyone's advice. It is much appreciated! Best, Elena

ADD COMMENT • link 12.2 years ago Elena Sorokin ▴ 150

0

Entering edit mode

Hello, Is there anything that can assess differences at the individual splice site level? Testing at the isoform level is sometimes not useful because many, many factors go into determining the final structure of a fully-processed mRNA transcript, such as the transcription start site, cleavage at the three prime end, splicing of many internal introns, and so on. I think it is much more interesting and useful biologically to focus in on one or two aspects of transcription and test whether a condition or treatment has affected that one aspect. For example, I'd like to know if a condition or treatment affects individual splice site preference. ex): 5' V1 XXXXX------XXXXXXXXX 3' V2 XXXXX--------------XXXXX In this simple example, we have two isoforms arising from the same gene. To assess whether the condition has changed splicing, i.e., changed which splice site is preferred, then I need to know how many transcripts used acceptor site V1 versus acceptor site V2. I can do this by counting reads that align across the intron. A spliced read can support the V1 acceptor or the V2 acceptor, but never both. So I can be sure of which choice the spliced read represents. But let's say I have data from three treatments and three controls. How can I determine whether the treatment changed the ratio of V1 to V2 spliced reads? Can I do something like calculate the percentage of V1 reads overall and then compare the percentages using a t test? Sorry if this question is horribly naive! Best wishes, Ann

ADD REPLY • link 12.2 years ago Ann Loraine ▴ 110

Login before adding your answer.