Entering edit mode
Dear Miguel,
There is no assumption in edgeR that variances be equal between
groups.
The variance depends on the mean, and the mean depends on the library
size, and there is no assumption that library sizes are equal between
groups. So computing the variance is not of interest.
The concept of biological coefficient of variation, or coefficient of
biological variation, is due to Robinson, McCarthy and Smyth
(Bioinformatics 2010) and is explained more fully in McCarthy, Chen
and
Smyth (Nucleic Acids Research, 2012), so you might find it helpful to
read
the description of it in the latter paper.
I can't answer your other questions because you're doing your own
personalized analysis pipeline. I can only help posters using the
standard edgeR pipelines. I may be understanding poorly, but your
questions seem to be specific to your own data or to your own analysis
approach.
Best wishes
Gordon
> Date: Fri, 30 Mar 2012 10:30:09 +0200
> From: Miguel Gallach <miguel.gallach at="" univie.ac.at="">
> To: Bioconductor mailing list <bioconductor at="" r-project.org="">
> Subject: [BioC] edgeR-DeSeq - inconsistency between Variance and
> Coefficient of Variation?
>
> Dear list,
>
> I posted a question few days ago with any success, so I decided to
try
> again, explaining better my question and changing the header.
>
> I am analyzing RNA-Seq data with edgeR - DeSeq. I have two
biological
> groups, two replicates each, and I want to test DE between the two
> biological groups.
>
> For instance, with edgeR, I calculated tagwise dispersion for each
gene.
> With this dispersion data, I calculated the variance according to
the
> formula V = mu *( 1 + dispersion * mu). I used the definition from
> http://seqanswers.com/forums/showthread.php?t=5591&highlight=edgeR+v
ariance.
>
> When I plot the correlation of the variance between the two
biological
> groups, I found they have very good correlation. According to this
> result we can conclude that for most genes the variance is equal
between
> groups. From this it comes my first question: Is the assumption of
equal
> variances a requisite to perform the DE test?
>
> After this, I calculated the sqrt(dispersion) for every gene, i.e.,
> according to edgeR and DeSeq manuals, the coefficient of biological
> variation (i.e, C.V = s.d./mean = sqrt(dispersion)). Well, when I
plot
> the correlation of the C.V. between the two biological groups, what
I
> found now is that the C.V. for one biological group is
systematically
> higher than the C.V. in the other group. In other words, for most
genes
> in group 1, the C.V. is higher than that in group 2. This result can
be
> nicely seen as a regression line that is parallel an above the
expected
> y = x. Indeed I found something like y = 0.11 + x.
>
> This result scares me a lot. If I understood well, since C.V.1 >
C.V.2;
> sqrt(var1)/mean1 > sqrt(var2)/mean2; since var1 ~ var2, then
mean1<mean2> for most of the genes, which is obviously false. What am I missing?
How
> is it possible that two groups have similar variance but one group
have
> higher C.V. than the other (for most genes!)?
>
> I did not check this with DeSeq yet, but I assume the results will
be
> similar (given that the amount of DE genes are similar and
congruent).
>
> Any help would be very appreciated.
>
> Many thanks,
> Miguel Gallach
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}