Entering edit mode
Dear list
I would like to perform mRNA-seq cross-species comparison. In that
case it
would be necessary to account for the differences in gene length.
I already got a reply from the author of DESeq (see below) that this
is
currently can't be done with DESeq.
Is it possible to specify gene-specific normalization factor with
edgeR? or
to input read counts that have been normalized to gene length?
Thanks
Mali
---------- Forwarded message ----------
From: Simon Anders <anders@embl.de>
Date: Wed, Aug 3, 2011 at 10:03 AM
Subject: Re: DESeq between two plants with different gene length
To: mali salmon <shalmom1@gmail.com>
Hi Mali
On 08/02/2011 09:54 PM, mali salmon wrote:
> I have counts data of 2 plants, one is rice which have a genome, and
the
> other is non-model plant with no genome. In order to find the gene
> counts for the unknown genome-plant I assembled the reads, and
aligned
> the contigs to the rice proteome.
> Can I use DESeq to find DE genes between rice and the non-model
plant?
> The problem is that the genes length is different between the two
> plants. Does the comparison still be valid? Would you suggest to
> normalize to gene length before DESeq?
>
First the technical point: It might be appropriate to account for gene
length, but with the current version of DESeq, you cannot specify
gene-specific normalization factors, even though we'll add this
feature at
some point.
In general, I'm hesitating to recommend using DESeq for a cross-
species
comparison, but I also wouldn't know of any other good method. Such
comparisons are really difficult and proper interpretation is filled
with
methodological pitfalls.
Differences in gene length and ambiguity in assigning orthologous
genes are
the main technical ones. Another one is the question what constitutes
proper
replication here. Should you grow both species under identical
conditions in
the lab? If so, which conditions, those good for rice (e.g., lot of
water),
or those good for the other species? Maybe, you should grow both
species in
both conditions, and consider the samples from the same species but
different conditions as replicates, as this would capture as much of
the
environmental influence
as possible. Otherwise, you could not say whether the differences in
condition may be attributed to genetics (different species) or
environment
(different growth conditions) or an interaction of both (different
level of
adaption of the species to the chosen growth conditions).
I know, there are papers that try to do such comparisons, but I
haven't seen
anything yet addressing these issues in a convincing manner.
Simon
[[alternative HTML version deleted]]