Question

edgeR-DeSeq - inconsistency between Variance and Coefficient of Variation

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 8 hours ago

WEHI, Melbourne, Australia

Dear Miguel, There is no assumption in edgeR that variances be equal between groups. The variance depends on the mean, and the mean depends on the library size, and there is no assumption that library sizes are equal between groups. So computing the variance is not of interest. The concept of biological coefficient of variation, or coefficient of biological variation, is due to Robinson, McCarthy and Smyth (Bioinformatics 2010) and is explained more fully in McCarthy, Chen and Smyth (Nucleic Acids Research, 2012), so you might find it helpful to read the description of it in the latter paper. I can't answer your other questions because you're doing your own personalized analysis pipeline. I can only help posters using the standard edgeR pipelines. I may be understanding poorly, but your questions seem to be specific to your own data or to your own analysis approach. Best wishes Gordon > Date: Fri, 30 Mar 2012 10:30:09 +0200 > From: Miguel Gallach <miguel.gallach at="" univie.ac.at=""> > To: Bioconductor mailing list <bioconductor at="" r-project.org=""> > Subject: [BioC] edgeR-DeSeq - inconsistency between Variance and > Coefficient of Variation? > > Dear list, > > I posted a question few days ago with any success, so I decided to try > again, explaining better my question and changing the header. > > I am analyzing RNA-Seq data with edgeR - DeSeq. I have two biological > groups, two replicates each, and I want to test DE between the two > biological groups. > > For instance, with edgeR, I calculated tagwise dispersion for each gene. > With this dispersion data, I calculated the variance according to the > formula V = mu *( 1 + dispersion * mu). I used the definition from > http://seqanswers.com/forums/showthread.php?t=5591&highlight=edgeR+v ariance. > > When I plot the correlation of the variance between the two biological > groups, I found they have very good correlation. According to this > result we can conclude that for most genes the variance is equal between > groups. From this it comes my first question: Is the assumption of equal > variances a requisite to perform the DE test? > > After this, I calculated the sqrt(dispersion) for every gene, i.e., > according to edgeR and DeSeq manuals, the coefficient of biological > variation (i.e, C.V = s.d./mean = sqrt(dispersion)). Well, when I plot > the correlation of the C.V. between the two biological groups, what I > found now is that the C.V. for one biological group is systematically > higher than the C.V. in the other group. In other words, for most genes > in group 1, the C.V. is higher than that in group 2. This result can be > nicely seen as a regression line that is parallel an above the expected > y = x. Indeed I found something like y = 0.11 + x. > > This result scares me a lot. If I understood well, since C.V.1 > C.V.2; > sqrt(var1)/mean1 > sqrt(var2)/mean2; since var1 ~ var2, then mean1<mean2> for most of the genes, which is obviously false. What am I missing? How > is it possible that two groups have similar variance but one group have > higher C.V. than the other (for most genes!)? > > I did not check this with DeSeq yet, but I assume the results will be > similar (given that the amount of DE genes are similar and congruent). > > Any help would be very appreciated. > > Many thanks, > Miguel Gallach > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

Regression edgeR DESeq Regression edgeR DESeq • 2.1k views

ADD COMMENT • link 13.8 years ago Gordon Smyth 53k