variance and coefficient of variation with edgeR
1
0
Entering edit mode
@miguel-gallach-5128
Last seen 9.6 years ago
Dear list, I am analyzing RNA-Seq data with edgeR for a typical two factors design: $samples group lib.size norm.factors R4.Hot HotAdaptedHot 17409289 0.9881635 R5.Hot HotAdaptedHot 17642552 1.0818144 R9.Hot ColdAdaptedHot 20010974 0.8621807 R10.Hot ColdAdaptedHot 14064143 0.8932791 R4.Cold HotAdaptedCold 11968317 1.0061084 R5.Cold HotAdaptedCold 11072832 1.0523857 R9.Cold ColdAdaptedCold 22386103 1.0520949 R10.Cold ColdAdaptedCold 17408532 1.0903311 I found something quite interesting and is that non-native populations have systematically higher coefficient of variation than native populations. This is: CV (R4.Hot-R5Hot) < CV(R9.Hot-R10.Hot) and CV(R4.Cold-R5.Cold) > CV(R9.Cold-R10.Cold). Here you have the variables and calculations: C.V.R4.R5HC = sqrt (data$R4.R5.HC.disp) C.V.R9.R10HC = sqrt (data$R9.R10.HC.disp) var_R4.R5_HC=Conc.R4.R5.HC*(1+R4.R5.HC.disp*Conc.R4.R5.HC) var_R9.R10_HC=Conc.R9.R10.HC*(1+R9.R10.HC.disp*Conc.R9.R10.HC) The attached plot is the result of comparing variances (V = mu *( 1 + dispersion * mu ), according to http://seqanswers.com/forums/showthread.php?t=5591&highlight=edgeR+var iance) and C.V. (C.V. = sqrt(dispersion)) between biological groups at Hot temperature (i.e., comparin R4.Hot-R5.Hot vs. R9.Hot-R10.Hot). According to the left plot we can conclude that for most genes the variance is equal and then the assumption of equal variances is true. Hence we can perform DE test. Am I right? However, something I cannot understand is that the sqrt(R9.R10) > sqrt(R4.R5), i.e., the coefficient of variation of gene expression is systematically higher for all genes from R9.R10 than those in R4.R5. For this to be true, since variances are equal and C.V. = sqrt(var)/mean, then the mean of R9.R10 (i.e., Con.R9.R10) should be lower than that for R4.R5, which is obviously false. The reciprocal analysis for these samples at cold temperatures produces the equivalent, but inverted, result. What am I missing? How can this happen? Any help would be appreciated. Many thanks, Miguel Gallach [[alternative HTML version deleted]]
edgeR edgeR • 1.3k views
ADD COMMENT
0
Entering edit mode
@miguel-gallach-5128
Last seen 9.6 years ago
It seems I could not paste the plot... I hope you can see it now. Sorry, Miguel ========= On Tue, Mar 27, 2012 at 4:22 PM, Miguel Gallach <miguel.gallach at="" univie.ac.at=""> wrote: > Dear list, > > I am analyzing RNA-Seq data with edgeR for a typical two factors design: > > $samples > group lib.size norm.factors > R4.Hot HotAdaptedHot 17409289 0.9881635 > R5.Hot HotAdaptedHot 17642552 1.0818144 > R9.Hot ColdAdaptedHot 20010974 0.8621807 > R10.Hot ColdAdaptedHot 14064143 0.8932791 > R4.Cold HotAdaptedCold 11968317 1.0061084 > R5.Cold HotAdaptedCold 11072832 1.0523857 > R9.Cold ColdAdaptedCold 22386103 1.0520949 > R10.Cold ColdAdaptedCold 17408532 1.0903311 > > > I found something quite interesting and is that non-native populations > have systematically higher coefficient of variation than native > populations. This is: CV (R4.Hot-R5Hot) < CV(R9.Hot-R10.Hot) and > CV(R4.Cold-R5.Cold) > CV(R9.Cold-R10.Cold). > > Here you have the variables and calculations: > > C.V.R4.R5HC = sqrt (data$R4.R5.HC.disp) > C.V.R9.R10HC = sqrt (data$R9.R10.HC.disp) > > var_R4.R5_HC=Conc.R4.R5.HC*(1+R4.R5.HC.disp*Conc.R4.R5.HC) > var_R9.R10_HC=Conc.R9.R10.HC*(1+R9.R10.HC.disp*Conc.R9.R10.HC) > > > The attached plot is the result of comparing variances (V = mu *( 1 + > dispersion * mu ), according to > http://seqanswers.com/forums/showthread.php?t=5591&highlight=edgeR+v ariance) > and C.V. (C.V. = sqrt(dispersion)) between biological groups at Hot > temperature (i.e., comparin R4.Hot-R5.Hot vs. R9.Hot-R10.Hot). > > According to the left plot we can conclude that for most genes the > variance is equal and then the assumption of equal variances is true. Hence > we can perform DE test. Am I right? > > However, something I cannot understand is that the sqrt(R9.R10) > > sqrt(R4.R5), i.e., the coefficient of variation of gene expression is > systematically higher for all genes from R9.R10 than those in R4.R5. For > this to be true, since variances are equal and C.V. = sqrt(var)/mean, then > the mean of R9.R10 (i.e., Con.R9.R10) should be lower than that for R4.R5, > which is obviously false. The reciprocal analysis for these samples at cold > temperatures produces the equivalent, but inverted, result. > > What am I missing? How can this happen? > > > Any help would be appreciated. > > Many thanks, > Miguel Gallach > > > > > > > > -- Miguel Gallach Center for Integrative Bioinformatics Vienna (CIBIV) Max F. Perutz Laboratories(MFPL) Telf: +43 1 4277 24029 Postal Address: Ebene 1 Campus Vienna Biocenter 5 CIBIV, MFPL 1030 Vienna Austria e-mail: miguel.gallach at univie.ac.at migaca2001 at gmail.com
ADD COMMENT

Login before adding your answer.

Traffic: 718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6