8.2 years ago by
EMBL European Molecular Biology Laboratory
Normalisation: Briefly, the normalisation works as follows: if k_ij is
the count of the i-th gene (or in your case, I guess, taxon) in the
sample, then we compute f_i as the geometric mean of these values
samples. The normalised count is k_ij / f_i.
In more detail, it is described in the paper "Differential expression
analysis for sequence count data", a preprint is available at Nature
Precedings, (4282), 2010, the full publication will come out in Genome
Zero counts: The statistical model of DESeq includes situations in
the counts are zero in one group and non-zero in others, so I would
recommend leaving these taxa in the data, because you will benefit
getting proper statistical inference for these cases, too.
(Normalisation should, afaIcs, not significantly be affected, unless
there is some really odd asymmetry in your data.)
Il Oct/19/10 6:56 AM, Rui Luo ha scritto:
> Dear DEseq developers,
> I have a few questions related to the normalization step in
> It is stated that it will normalize the raw counts by
> but how the mathmatical idea is? would you mind giving a more
> Now I have two groups of metatranscriptome data, one group
> H.pylori, the other not. For sure, I have some transcripts in the
> group that are from H.pylori but not is in group two.
> I am wondering if I want to do differential expression
> these two groups, should I filter out the group specific transcripts
> putting into DEseq? Will this affect the normalization step?