Question

DESeq versus CuffDiff2 for RNA-seq expression quantification in parasite-infected blood

0

Entering edit mode

Kevin Lee ▴ 40

@kevin-lee-5904

Last seen 9.7 years ago

Hello, My name is Kevin Lee, and I am a PhD candidate in Bioinformatics at Georgia Institute of Technology. I have been trying to decide between various normalization techniques for RNA-seq methods for a parasite infection time-course study. The RNA-seq data that I will have will be from infected blood and will contain RNA from both the host and the parasite. I believe that it is best to separate the analysis of these two RNA sub-sets for the purposes of normalization. I have been using DESeq because it is clearly superior to FPKM. Recently however, I was intrigued by the new Cuff Diff 2 software. As I have weighed the two methods (DESeq versus CuffDiff2), I see the benefits of each. CuffDiff2 has the advantage that it quantifies isoform abundance, but it uses FPKM to "normalize" expression levels across samples. DESeq, however, estimates library size much more robustly than RPKM. Since my study will be looking at immune response to a parasite over time, I expect that there will be at least a few genes that are VERY differentially expressed in one or more of the conditions/time points. Consequently, I believe that a DESeq normalization approach will yield much more accurate analysis. Does this seem like a reasonable assessment? If I do choose to use DESeq, one what level should I quantify transcription: exonic, gene, or isoform? Exonic seems the most straight-forward but also has the drawback of representing a much smaller "area", and individual exons will have much fewer reads that map compared to the number of reads mapping to the gene (of which the exon is a part). If quantifying by gene, how is a gene defined: as all exons from all isoforms? I don't know of any way to quantify isoforms as is done by CuffDiff, and this is the main reason I am hesitant about using DESeq. One possible approach is to use reads that fall within any annotated exon as being part of a gene. And using those measures to normalize and test for differential expression. And in parallel use DEXSeq to test for differential exon usage. Does this seem like a reasonable approach? Any further advice? Cheers, Kevin -- Kevin Lee Georgia Institute of Technology Department of Biology PhD candidate in Bioinformatics [[alternative HTML version deleted]]

Normalization DESeq DEXSeq Normalization DESeq DEXSeq • 3.5k views

ADD COMMENT • link updated 11.0 years ago by Simon Anders ★ 3.7k • written 11.0 years ago by Kevin Lee ▴ 40

score 0 · Answer 1 · 2013-04-25

0

Entering edit mode

Simon Anders ★ 3.7k

@simon-anders-3855

Last seen 3.8 years ago

Zentrum für Molekularbiologie, Universi…

Hi Kevin > One possible approach is to use reads that fall within any annotated exon > as being part of a gene. And using those measures to normalize and test > for differential expression. And in parallel use DEXSeq to test for > differential exon usage. Does this seem like a reasonable approach? Any > further advice? As one of the author of both DESeq and DEXSeq, I probably cannot give you an unbiased opinion on how our tools compare to cuffdiff; but if you go for our methods, then yes, what you wrote is the approach we recommend. Simon

ADD COMMENT • link 11.0 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Hello Simon, I appreciate your assistance. A follow-up question: what is the appropriate method to handle a read that splits an exon junction and is therefore mapped to two exons when using a short read mapping software? Counting it as being present in both exons seems to give undue weight to the read when using DESeq; conversely, it seems important to "double count" it when using DEXSeq. Any advice? And any software to readily generate these kinds of files, the matrix files required for DE(X)Seq? I have just been using an overlapper script that I wrote using the bam files and ucsc gene annotations. Cheers, Kevin On Thu, Apr 25, 2013 at 3:29 PM, Simon Anders <anders@embl.de> wrote: > Hi Kevin > > > One possible approach is to use reads that fall within any annotated exon >> as being part of a gene. And using those measures to normalize and test >> for differential expression. And in parallel use DEXSeq to test for >> differential exon usage. Does this seem like a reasonable approach? Any >> further advice? >> > > As one of the author of both DESeq and DEXSeq, I probably cannot give you > an unbiased opinion on how our tools compare to cuffdiff; but if you go for > our methods, then yes, what you wrote is the approach we recommend. > > Simon > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- Kevin Lee Georgia Institute of Technology Department of Biology PhD candidate in Bioinformatics [[alternative HTML version deleted]]

ADD REPLY • link 11.0 years ago Kevin Lee ▴ 40

0

Entering edit mode

Hi Kevin On 26/04/13 19:32, Kevin Lee wrote: > I appreciate your assistance. A follow-up question: what is the > appropriate method to handle a read that splits an exon junction and is > therefore mapped to two exons when using a short read mapping software? > Counting it as being present in both exons seems to give undue weight > to the read when using DESeq; conversely, it seems important to "double > count" it when using DEXSeq. Any advice? And any software to readily > generate these kinds of files, the matrix files required for DE(X)Seq? > I have just been using an overlapper script that I wrote using the bam > files and ucsc gene annotations. I use Python scripts for counting. For DESeq, you can use the htseq-count script (available from http://www-huber.embl.de/users/anders/HTSeq/ ), and for DEXSeq, use the dexseq-count.py script that comes with the DEXSeq Bioconductor package. The reason that we offer two scripts, and suggest to produce sepearte count tables for DESeq and DEXSeq, is precisely because of the issue with reads mapping to two exons that you point out. While this works well, I do admit that this state of thing is not terribly elegant. BTW: If you use our scripts with UCSC annotation, make sure to fix the gene IDs. (The UCSC table browser puts transcript IDs where it should put gene IDs; you need to remove the ".nn" suffixes. You will see what I mean once you have a look at the GFF files.) Simon

ADD REPLY • link 11.0 years ago Simon Anders ★ 3.7k