hi,
A few papers have concluded that DESeq is more accurate for DE genes
discovery than methods using FPKM, and that the bias in FPKM is that a
genes FPKM depends on the expression of other genes due to the
division by
library size.
Now if my purpose is for visualization or analysis other than looking
for
DEGs, I wonder if its better to replace FPKM by DESeq normalized
gene
count divided by gene length?
Thanks for your comment!
Jack
[[alternative HTML version deleted]]
Hi Jack
On 23/07/13 17:29, Jike Cui wrote:
> A few papers have concluded that DESeq is more accurate for DE genes
> discovery than methods using FPKM, and that the bias in FPKM is that
a
> gene?s FPKM depends on the expression of other genes due to the
division by
> library size.
>
> Now if my purpose is for visualization or analysis other than
looking for
> DEGs, I wonder if it?s better to replace FPKM by DESeq normalized
gene
> count divided by gene length?
Yes, definitely.
Look at it this way: To account for sequencing depth, you divide the
raw
counts by a number which quantifies this depth. Simply using the total
number of reads (divided by 1 million) is an obvious but very
simplistic
choice, and the various other scaling normalization schemes (our
median-of-ratios approach from DESeq, but also other similar
suggestions
such as TMM, etc.) are simply meant to suggest a more clever way to
find
a number to divide by.
In case of DESeq, we try to get this numbers to be close to one. If
you
want to have the same scale as typical FPKM values (and so have better
comparability across experiments), you could then divide everything by
something like
geometric mean of the total read counts of all samples / 1 million
You may want to look, though, also at the variance-stabilizing
transformation (VST) and the regularized log transformation (rlog)
that
we offer in DESeq2, and which, we feel, offers a better input for
downstream visualization.
Simon