Search
Question: DESeq versus CuffDiff2 for RNA-seq expression quantification in parasite-infected blood
0
gravatar for Kevin Lee
4.6 years ago by
Kevin Lee40
Kevin Lee40 wrote:
Hello, My name is Kevin Lee, and I am a PhD candidate in Bioinformatics at Georgia Institute of Technology. I have been trying to decide between various normalization techniques for RNA-seq methods for a parasite infection time-course study. The RNA-seq data that I will have will be from infected blood and will contain RNA from both the host and the parasite. I believe that it is best to separate the analysis of these two RNA sub-sets for the purposes of normalization. I have been using DESeq because it is clearly superior to FPKM. Recently however, I was intrigued by the new Cuff Diff 2 software. As I have weighed the two methods (DESeq versus CuffDiff2), I see the benefits of each. CuffDiff2 has the advantage that it quantifies isoform abundance, but it uses FPKM to "normalize" expression levels across samples. DESeq, however, estimates library size much more robustly than RPKM. Since my study will be looking at immune response to a parasite over time, I expect that there will be at least a few genes that are VERY differentially expressed in one or more of the conditions/time points. Consequently, I believe that a DESeq normalization approach will yield much more accurate analysis. Does this seem like a reasonable assessment? If I do choose to use DESeq, one what level should I quantify transcription: exonic, gene, or isoform? Exonic seems the most straight-forward but also has the drawback of representing a much smaller "area", and individual exons will have much fewer reads that map compared to the number of reads mapping to the gene (of which the exon is a part). If quantifying by gene, how is a gene defined: as all exons from all isoforms? I don't know of any way to quantify isoforms as is done by CuffDiff, and this is the main reason I am hesitant about using DESeq. One possible approach is to use reads that fall within any annotated exon as being part of a gene. And using those measures to normalize and test for differential expression. And in parallel use DEXSeq to test for differential exon usage. Does this seem like a reasonable approach? Any further advice? Cheers, Kevin -- Kevin Lee Georgia Institute of Technology Department of Biology PhD candidate in Bioinformatics [[alternative HTML version deleted]]
ADD COMMENTlink modified 4.6 years ago by Simon Anders3.4k • written 4.6 years ago by Kevin Lee40
0
gravatar for Simon Anders
4.6 years ago by
Simon Anders3.4k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.4k wrote:
Hi Kevin > One possible approach is to use reads that fall within any annotated exon > as being part of a gene. And using those measures to normalize and test > for differential expression. And in parallel use DEXSeq to test for > differential exon usage. Does this seem like a reasonable approach? Any > further advice? As one of the author of both DESeq and DEXSeq, I probably cannot give you an unbiased opinion on how our tools compare to cuffdiff; but if you go for our methods, then yes, what you wrote is the approach we recommend. Simon
ADD COMMENTlink written 4.6 years ago by Simon Anders3.4k
Hello Simon, I appreciate your assistance. A follow-up question: what is the appropriate method to handle a read that splits an exon junction and is therefore mapped to two exons when using a short read mapping software? Counting it as being present in both exons seems to give undue weight to the read when using DESeq; conversely, it seems important to "double count" it when using DEXSeq. Any advice? And any software to readily generate these kinds of files, the matrix files required for DE(X)Seq? I have just been using an overlapper script that I wrote using the bam files and ucsc gene annotations. Cheers, Kevin On Thu, Apr 25, 2013 at 3:29 PM, Simon Anders <anders@embl.de> wrote: > Hi Kevin > > > One possible approach is to use reads that fall within any annotated exon >> as being part of a gene. And using those measures to normalize and test >> for differential expression. And in parallel use DEXSeq to test for >> differential exon usage. Does this seem like a reasonable approach? Any >> further advice? >> > > As one of the author of both DESeq and DEXSeq, I probably cannot give you > an unbiased opinion on how our tools compare to cuffdiff; but if you go for > our methods, then yes, what you wrote is the approach we recommend. > > Simon > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- Kevin Lee Georgia Institute of Technology Department of Biology PhD candidate in Bioinformatics [[alternative HTML version deleted]]
ADD REPLYlink written 4.6 years ago by Kevin Lee40
Hi Kevin On 26/04/13 19:32, Kevin Lee wrote: > I appreciate your assistance. A follow-up question: what is the > appropriate method to handle a read that splits an exon junction and is > therefore mapped to two exons when using a short read mapping software? > Counting it as being present in both exons seems to give undue weight > to the read when using DESeq; conversely, it seems important to "double > count" it when using DEXSeq. Any advice? And any software to readily > generate these kinds of files, the matrix files required for DE(X)Seq? > I have just been using an overlapper script that I wrote using the bam > files and ucsc gene annotations. I use Python scripts for counting. For DESeq, you can use the htseq-count script (available from http://www-huber.embl.de/users/anders/HTSeq/ ), and for DEXSeq, use the dexseq-count.py script that comes with the DEXSeq Bioconductor package. The reason that we offer two scripts, and suggest to produce sepearte count tables for DESeq and DEXSeq, is precisely because of the issue with reads mapping to two exons that you point out. While this works well, I do admit that this state of thing is not terribly elegant. BTW: If you use our scripts with UCSC annotation, make sure to fix the gene IDs. (The UCSC table browser puts transcript IDs where it should put gene IDs; you need to remove the ".nn" suffixes. You will see what I mean once you have a look at the GFF files.) Simon
ADD REPLYlink written 4.6 years ago by Simon Anders3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 107 users visited in the last hour