5.0 years ago by
Zentrum für Molekularbiologie, Universität Heidelberg
On 20/03/13 14:15, dvir.tau at gmail.com wrote:
> I'm running DESeq and EdgeR on RNA-Seq data that was already
> RSEM (downloaded from TCGA web site).
> Since these methods require the raw read counts I'm using the
> column of the RSEM output but I'm not sure this is the right thing
to do (is
> it the actual raw count required ?)
The real issue is not that your counts are not integer, but that RSEM
gives you counts per isoform rather than per gene. Now, if you have
very similar isoforms, RSEM will be unable to decide which isoform to
assign a read to and just spread them proportionally over both. Hence,
even if only one of the two isoforms is differentially expressed, you
will incorrectly see differential expression for both isoforms.
This is why the output of isoform quantification methods such as RSEM
MMSeq are not suitable as input for differential expression tests.
At the very minimum, you need also the information about the
of the assignments of reads to isoforms. In fact, RSEM provides this
information if you run it in its Bayesian mode, but this seems to be
hardly ever done in practice.
If you really need to perform differential expression analysis on a
level finer than whole gene expression, you should either use a tool
differential exon usage testing, such as our DEXSeq package, or one
combines isoform abundance estimation and testing for differences in a
unified framework, such as BitSeq. In both cases, you will need the
If you are fine with staying on the gene level for your analysis, you
need to get counts per gene, not per isoform. I am not familiar enough
with RSEM, though, to tell you whether adding up the counts from all
isoforms per gene would be a good idea.