Question

edgeR -- gene expression variability

0

Entering edit mode

Miguel Gallach ▴ 40

@miguel-gallach-5016

Last seen 10.6 years ago

Hi List, I am analyzing my RNA-Seq data with edgeR. The next is my experimental design: d.GLM An object of class "DGEList" $samples group lib.size norm.factors R4.Hot HotAdaptedHot 17409289 0.9881635 R5.Hot HotAdaptedHot 17642552 1.0818144 R9.Hot ColdAdaptedHot 20010974 0.8621807 R10.Hot ColdAdaptedHot 14064143 0.8932791 R4.Cold HotAdaptedCold 11968317 1.0061084 R5.Cold HotAdaptedCold 11072832 1.0523857 R9.Cold ColdAdaptedCold 22386103 1.0520949 R10.Cold ColdAdaptedCold 17408532 1.0903311 As you can see, R4 and R5 are replicates of the same biological group (Hot adapted), and the same is true for R9 and R10 (Cold adapted). I am interested in measuring for each gene its expression variability within a biological group (at each temperature) to discern genes that might be tightly regulated (or under stabilizing selection). The question in particular is: How can I get tagwise dispersion values for the pairs (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold + R10.Cold). I assume that the square root of each tagwise dispersion value can be interpreted as the expression variance of the corresponding gene (i.e., biological variation), as I understood from the edgeR manual. Am I correct? I tried to calculate it like this: R4.R5.HC = edgeR_expressed_genes[,1:2] #I tell edgeR there is only one factor, two replicates group = factor(c("HC", "HC")) Hot.Hot = DGEList(counts = R4.R5.HC, group = group) Hot.Hot = calcNormFactors(Hot.Hot) Hot.Hot = estimateCommonDisp(Hot.Hot) Hot.Hot = estimateTagwiseDisp(Hot.Hot) (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold + R10.Cold)). What I don't understand is why I just got 20 different dispersion values for all genes: dim(table(Hot.Hot$tagwise.dispersion)) [1] 20 However, when I use the d.GLM dataset (i.e., the 8 samples for the 2x2 factor design) I get one different dispersion value for each gene: > dim(table(d.GLM1$tagwise.dispersion)) [1] 9418 Why is this? Can I get gene expression variability in a better way to fulfill my aim? Thank you very much, Miguel Gallach [[alternative HTML version deleted]]

edgeR edgeR • 1.6k views

ADD COMMENT • link updated 13.3 years ago by Gordon Smyth 52k • written 13.3 years ago by Miguel Gallach ▴ 40

score 0 · Answer 1 · 2012-01-04

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 8 hours ago

WEHI, Melbourne, Australia

Dear Miguel, What you are doing seems correct. Although of course expecting to get good estimates of genewise dispersions from just two libraries (one degree of freedom) is a bit optimistic. edgeR tries to do the best that can be done. The edgeR manual tells you that the sqrt(dispersion) is the biological coefficient of variation. Coefficient of variation means sd/mean rather than variance. It is a more appropriate measure of variability than the standard deviation for quantities that are strictly positive. The reason why estimateTagwiseDisp() returns a limited number of distinct dispersions is that it maximizes the tagwise dispersions on a grid of 200 possible dispersion values. estimateGLMTagwiseDisp() does something similar, but adds an extra refinement step in which it interpolates a cubic spline through the grid values and maximizes the spline. Hence the dispersion values from estimateTagwiseDisp() are taken from a (largish) set of preset values whereas those from estimateGLMTagwiseDisp() are always different. This has no major impact I think on a practical analysis. Nevertheless we have modified estimateTagwiseDisp() on Bioc devel to work like estimateGLMTagwiseDisp(), so in future they with behave in a directly comparable way. Please give sessionInfo() output so that we can see what versions of the package you are using. Best wishes Gordon > Date: Mon, 2 Jan 2012 13:40:59 +0100 > From: Miguel Gallach <miguel.gallach at="" vetmeduni.ac.at=""> > To: bioconductor at r-project.org > Subject: [BioC] edgeR -- gene expression variability > > Hi List, > > I am analyzing my RNA-Seq data with edgeR. The next is my experimental > design: > > > d.GLM > An object of class "DGEList" > $samples > group lib.size norm.factors > R4.Hot HotAdaptedHot 17409289 0.9881635 > R5.Hot HotAdaptedHot 17642552 1.0818144 > R9.Hot ColdAdaptedHot 20010974 0.8621807 > R10.Hot ColdAdaptedHot 14064143 0.8932791 > R4.Cold HotAdaptedCold 11968317 1.0061084 > R5.Cold HotAdaptedCold 11072832 1.0523857 > R9.Cold ColdAdaptedCold 22386103 1.0520949 > R10.Cold ColdAdaptedCold 17408532 1.0903311 > > > As you can see, R4 and R5 are replicates of the same biological group (Hot > adapted), and the same is true for R9 and R10 (Cold adapted). > > I am interested in measuring for each gene its expression variability > within a biological group (at each temperature) to discern genes that might > be tightly regulated (or under stabilizing selection). The question in > particular is: How can I get tagwise dispersion values for the pairs > (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold + > R10.Cold). I assume that the square root of each tagwise dispersion value > can be interpreted as the expression variance of the corresponding gene > (i.e., biological variation), as I understood from the edgeR manual. Am I > correct? > > I tried to calculate it like this: > > R4.R5.HC = edgeR_expressed_genes[,1:2] > #I tell edgeR there is only one factor, two replicates > group = factor(c("HC", "HC")) > Hot.Hot = DGEList(counts = R4.R5.HC, group = group) > Hot.Hot = calcNormFactors(Hot.Hot) > Hot.Hot = estimateCommonDisp(Hot.Hot) > Hot.Hot = estimateTagwiseDisp(Hot.Hot) > > (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold + > R10.Cold)). > > What I don't understand is why I just got 20 different dispersion values > for all genes: > > dim(table(Hot.Hot$tagwise.dispersion)) > [1] 20 > > However, when I use the d.GLM dataset (i.e., the 8 samples for the 2x2 > factor design) I get one different dispersion value for each gene: > >> dim(table(d.GLM1$tagwise.dispersion)) > [1] 9418 > > > Why is this? > > Can I get gene expression variability in a better way to fulfill my aim? > > > Thank you very much, > Miguel Gallach ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 13.3 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Gordon, thanks so much for your answer. Here you have the version info: sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] limma_3.10.0 edgeR_2.4.1 loaded via a namespace (and not attached): [1] tools_2.14.0 I understand the problem of having only two replicates, but is the best I can have. However, let me ask you another question: I found a negative correlation between expression level and sqrt(dispersion). I think this is kind of logical, so I just "normalized" the data by dividing sqrt(dispersion)/expression. However, I did this thinking that sqrt(dispersion) was a kind of s.d. But now, since you tell me that sqrt(dispersion) is equivalent to sd/mean, I am not sure my normalization is appropriate (I mean, I am dividing by mean express. twice.) Is my interpretation correct? Thanks again, Miguel On Wed, Jan 4, 2012 at 1:04 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Miguel, > > What you are doing seems correct. Although of course expecting to get > good estimates of genewise dispersions from just two libraries (one degree > of freedom) is a bit optimistic. edgeR tries to do the best that can be > done. > > The edgeR manual tells you that the sqrt(dispersion) is the biological > coefficient of variation. Coefficient of variation means sd/mean rather > than variance. It is a more appropriate measure of variability than the > standard deviation for quantities that are strictly positive. > > The reason why estimateTagwiseDisp() returns a limited number of distinct > dispersions is that it maximizes the tagwise dispersions on a grid of 200 > possible dispersion values. estimateGLMTagwiseDisp() does something > similar, but adds an extra refinement step in which it interpolates a cubic > spline through the grid values and maximizes the spline. Hence the > dispersion values from estimateTagwiseDisp() are taken from a (largish) set > of preset values whereas those from estimateGLMTagwiseDisp() are always > different. > > This has no major impact I think on a practical analysis. Nevertheless we > have modified estimateTagwiseDisp() on Bioc devel to work like > estimateGLMTagwiseDisp(), so in future they with behave in a directly > comparable way. > > Please give sessionInfo() output so that we can see what versions of the > package you are using. > > Best wishes > Gordon > > Date: Mon, 2 Jan 2012 13:40:59 +0100 >> From: Miguel Gallach <miguel.gallach@vetmeduni.ac.**at<miguel.gallach@vetmeduni.ac.at> >> > >> To: bioconductor@r-project.org >> Subject: [BioC] edgeR -- gene expression variability >> >> Hi List, >> >> I am analyzing my RNA-Seq data with edgeR. The next is my experimental >> design: >> >> >> d.GLM >> An object of class "DGEList" >> $samples >> group lib.size norm.factors >> R4.Hot HotAdaptedHot 17409289 0.9881635 >> R5.Hot HotAdaptedHot 17642552 1.0818144 >> R9.Hot ColdAdaptedHot 20010974 0.8621807 >> R10.Hot ColdAdaptedHot 14064143 0.8932791 >> R4.Cold HotAdaptedCold 11968317 1.0061084 >> R5.Cold HotAdaptedCold 11072832 1.0523857 >> R9.Cold ColdAdaptedCold 22386103 1.0520949 >> R10.Cold ColdAdaptedCold 17408532 1.0903311 >> >> >> As you can see, R4 and R5 are replicates of the same biological group (Hot >> adapted), and the same is true for R9 and R10 (Cold adapted). >> >> I am interested in measuring for each gene its expression variability >> within a biological group (at each temperature) to discern genes that >> might >> be tightly regulated (or under stabilizing selection). The question in >> particular is: How can I get tagwise dispersion values for the pairs >> (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold + >> R10.Cold). I assume that the square root of each tagwise dispersion value >> can be interpreted as the expression variance of the corresponding gene >> (i.e., biological variation), as I understood from the edgeR manual. Am I >> correct? >> >> I tried to calculate it like this: >> >> R4.R5.HC = edgeR_expressed_genes[,1:2] >> #I tell edgeR there is only one factor, two replicates >> group = factor(c("HC", "HC")) >> Hot.Hot = DGEList(counts = R4.R5.HC, group = group) >> Hot.Hot = calcNormFactors(Hot.Hot) >> Hot.Hot = estimateCommonDisp(Hot.Hot) >> Hot.Hot = estimateTagwiseDisp(Hot.Hot) >> >> (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold + >> R10.Cold)). >> >> What I don't understand is why I just got 20 different dispersion values >> for all genes: >> >> dim(table(Hot.Hot$tagwise.**dispersion)) >> [1] 20 >> >> However, when I use the d.GLM dataset (i.e., the 8 samples for the 2x2 >> factor design) I get one different dispersion value for each gene: >> >> dim(table(d.GLM1$tagwise.**dispersion)) >>> >> [1] 9418 >> >> >> Why is this? >> >> Can I get gene expression variability in a better way to fulfill my aim? >> >> >> Thank you very much, >> Miguel Gallach >> > > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:20}}

ADD REPLY • link 13.3 years ago Miguel Gallach ▴ 40

0

Entering edit mode

Sorry again Gordon, In addition to the previous question, what is the unit of dispersion. I mean, the dispersion is calculated for the logCon, Con or counts? This should be important if I want to calculate confidence intervals, right? In addition, why logCon != log2(Conc)? This happens when I apply myself the log2 (Conc), which is not exactly equal to the logCon provided by edgeR. Sorry for being so picky, but I really want to understand where do the data come from? Many thanks again and all the best, Miguel On Wed, Jan 4, 2012 at 9:06 AM, Miguel Gallach < miguel.gallach@vetmeduni.ac.at> wrote: > Dear Gordon, > > thanks so much for your answer. > > Here you have the version info: > > sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] limma_3.10.0 edgeR_2.4.1 > > loaded via a namespace (and not attached): > [1] tools_2.14.0 > > > I understand the problem of having only two replicates, but is the best I > can have. However, let me ask you another question: I found a negative > correlation between expression level and sqrt(dispersion). I think this is > kind of logical, so I just "normalized" the data by dividing > sqrt(dispersion)/expression. However, I did this thinking that > sqrt(dispersion) was a kind of s.d. But now, since you tell me that > sqrt(dispersion) is equivalent to sd/mean, I am not sure my normalization > is appropriate (I mean, I am dividing by mean express. twice.) Is my > interpretation correct? > > > Thanks again, > Miguel > > > > > On Wed, Jan 4, 2012 at 1:04 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > >> Dear Miguel, >> >> What you are doing seems correct. Although of course expecting to get >> good estimates of genewise dispersions from just two libraries (one degree >> of freedom) is a bit optimistic. edgeR tries to do the best that can be >> done. >> >> The edgeR manual tells you that the sqrt(dispersion) is the biological >> coefficient of variation. Coefficient of variation means sd/mean rather >> than variance. It is a more appropriate measure of variability than the >> standard deviation for quantities that are strictly positive. >> >> The reason why estimateTagwiseDisp() returns a limited number of distinct >> dispersions is that it maximizes the tagwise dispersions on a grid of 200 >> possible dispersion values. estimateGLMTagwiseDisp() does something >> similar, but adds an extra refinement step in which it interpolates a cubic >> spline through the grid values and maximizes the spline. Hence the >> dispersion values from estimateTagwiseDisp() are taken from a (largish) set >> of preset values whereas those from estimateGLMTagwiseDisp() are always >> different. >> >> This has no major impact I think on a practical analysis. Nevertheless >> we have modified estimateTagwiseDisp() on Bioc devel to work like >> estimateGLMTagwiseDisp(), so in future they with behave in a directly >> comparable way. >> >> Please give sessionInfo() output so that we can see what versions of the >> package you are using. >> >> Best wishes >> Gordon >> >> Date: Mon, 2 Jan 2012 13:40:59 +0100 >>> From: Miguel Gallach <miguel.gallach@vetmeduni.ac.**at<miguel.gallach@vetmeduni.ac.at> >>> > >>> To: bioconductor@r-project.org >>> Subject: [BioC] edgeR -- gene expression variability >>> >>> Hi List, >>> >>> I am analyzing my RNA-Seq data with edgeR. The next is my experimental >>> design: >>> >>> >>> d.GLM >>> An object of class "DGEList" >>> $samples >>> group lib.size norm.factors >>> R4.Hot HotAdaptedHot 17409289 0.9881635 >>> R5.Hot HotAdaptedHot 17642552 1.0818144 >>> R9.Hot ColdAdaptedHot 20010974 0.8621807 >>> R10.Hot ColdAdaptedHot 14064143 0.8932791 >>> R4.Cold HotAdaptedCold 11968317 1.0061084 >>> R5.Cold HotAdaptedCold 11072832 1.0523857 >>> R9.Cold ColdAdaptedCold 22386103 1.0520949 >>> R10.Cold ColdAdaptedCold 17408532 1.0903311 >>> >>> >>> As you can see, R4 and R5 are replicates of the same biological group >>> (Hot >>> adapted), and the same is true for R9 and R10 (Cold adapted). >>> >>> I am interested in measuring for each gene its expression variability >>> within a biological group (at each temperature) to discern genes that >>> might >>> be tightly regulated (or under stabilizing selection). The question in >>> particular is: How can I get tagwise dispersion values for the pairs >>> (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold + >>> R10.Cold). I assume that the square root of each tagwise dispersion value >>> can be interpreted as the expression variance of the corresponding gene >>> (i.e., biological variation), as I understood from the edgeR manual. Am I >>> correct? >>> >>> I tried to calculate it like this: >>> >>> R4.R5.HC = edgeR_expressed_genes[,1:2] >>> #I tell edgeR there is only one factor, two replicates >>> group = factor(c("HC", "HC")) >>> Hot.Hot = DGEList(counts = R4.R5.HC, group = group) >>> Hot.Hot = calcNormFactors(Hot.Hot) >>> Hot.Hot = estimateCommonDisp(Hot.Hot) >>> Hot.Hot = estimateTagwiseDisp(Hot.Hot) >>> >>> (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold + >>> R10.Cold)). >>> >>> What I don't understand is why I just got 20 different dispersion values >>> for all genes: >>> >>> dim(table(Hot.Hot$tagwise.**dispersion)) >>> [1] 20 >>> >>> However, when I use the d.GLM dataset (i.e., the 8 samples for the 2x2 >>> factor design) I get one different dispersion value for each gene: >>> >>> dim(table(d.GLM1$tagwise.**dispersion)) >>>> >>> [1] 9418 >>> >>> >>> Why is this? >>> >>> Can I get gene expression variability in a better way to fulfill my aim? >>> >>> >>> Thank you very much, >>> Miguel Gallach >>> >> >> ______________________________**______________________________** >> __________ >> The information in this email is confidential and intended solely for the >> addressee. >> You must not disclose, forward, print or use it without the permission of >> the sender. >> ______________________________**______________________________** >> __________ >> > > > > -- > Miguel Gallach > Institut für Populationsgenetik > Veterinärmedizinische Universität Wien > Josef Baumann Gasse 1 > 1210 Wien > Austria > -- Miguel Gallach Institut für Populationsgenetik Veterinärmedizinische Universität Wien Josef Baumann Gasse 1 1210 Wien Austria [[alternative HTML version deleted]]

ADD REPLY • link 13.3 years ago Miguel Gallach ▴ 40

0

Entering edit mode

Dear Miguel, I'm afraid that I don't understand your questions. There is no quantity in edgeR called "Con", there is no sensible way that I know of to normalize counts using the dispersion, nor any need to do so, and I do not follow for what quantity you are trying to obtain a confidence interval. I would prefer that you did a little more background reading before sending more questions. The three papers by Mark Robinson and myself about edgeR might help, and there's plenty of public documentation on the coefficient of variation: http://en.wikipedia.org/wiki/Coefficient_of_variation The dispersion is a coefficient of variation is always dimensionless, because CV=sd/mean and the dimensions of the sd and the mean cancel out. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. Tel: (03) 9345 2326, Fax (03) 9347 0852, smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth On Wed, 4 Jan 2012, Miguel Gallach wrote: > Sorry again Gordon, > > In addition to the previous question, what is the unit of dispersion. I > mean, the dispersion is calculated for the logCon, Con or counts? This > should be important if I want to calculate confidence intervals, right? > In addition, why logCon != log2(Conc)? This happens when I apply myself the > log2 (Conc), which is not exactly equal to the logCon provided by edgeR. > Sorry for being so picky, but I really want to understand where do the data > come from? > > > Many thanks again and all the best, > Miguel > > On Wed, Jan 4, 2012 at 9:06 AM, Miguel Gallach < > miguel.gallach at vetmeduni.ac.at> wrote: > >> Dear Gordon, >> >> thanks so much for your answer. >> >> Here you have the version info: >> >> sessionInfo() >> R version 2.14.0 (2011-10-31) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] limma_3.10.0 edgeR_2.4.1 >> >> loaded via a namespace (and not attached): >> [1] tools_2.14.0 >> >> >> I understand the problem of having only two replicates, but is the best I >> can have. However, let me ask you another question: I found a negative >> correlation between expression level and sqrt(dispersion). I think this is >> kind of logical, so I just "normalized" the data by dividing >> sqrt(dispersion)/expression. However, I did this thinking that >> sqrt(dispersion) was a kind of s.d. But now, since you tell me that >> sqrt(dispersion) is equivalent to sd/mean, I am not sure my normalization >> is appropriate (I mean, I am dividing by mean express. twice.) Is my >> interpretation correct? >> >> >> Thanks again, >> Miguel >> >> >> >> >> On Wed, Jan 4, 2012 at 1:04 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> >>> Dear Miguel, >>> >>> What you are doing seems correct. Although of course expecting to get >>> good estimates of genewise dispersions from just two libraries (one degree >>> of freedom) is a bit optimistic. edgeR tries to do the best that can be >>> done. >>> >>> The edgeR manual tells you that the sqrt(dispersion) is the biological >>> coefficient of variation. Coefficient of variation means sd/mean rather >>> than variance. It is a more appropriate measure of variability than the >>> standard deviation for quantities that are strictly positive. >>> >>> The reason why estimateTagwiseDisp() returns a limited number of distinct >>> dispersions is that it maximizes the tagwise dispersions on a grid of 200 >>> possible dispersion values. estimateGLMTagwiseDisp() does something >>> similar, but adds an extra refinement step in which it interpolates a cubic >>> spline through the grid values and maximizes the spline. Hence the >>> dispersion values from estimateTagwiseDisp() are taken from a (largish) set >>> of preset values whereas those from estimateGLMTagwiseDisp() are always >>> different. >>> >>> This has no major impact I think on a practical analysis. Nevertheless >>> we have modified estimateTagwiseDisp() on Bioc devel to work like >>> estimateGLMTagwiseDisp(), so in future they with behave in a directly >>> comparable way. >>> >>> Please give sessionInfo() output so that we can see what versions of the >>> package you are using. >>> >>> Best wishes >>> Gordon >>> >>> Date: Mon, 2 Jan 2012 13:40:59 +0100 >>>> From: Miguel Gallach <miguel.gallach at="" vetmeduni.ac.**at<miguel.gallach="" at="" vetmeduni.ac.at=""> >>>>> >>>> To: bioconductor at r-project.org >>>> Subject: [BioC] edgeR -- gene expression variability >>>> >>>> Hi List, >>>> >>>> I am analyzing my RNA-Seq data with edgeR. The next is my experimental >>>> design: >>>> >>>> >>>> d.GLM >>>> An object of class "DGEList" >>>> $samples >>>> group lib.size norm.factors >>>> R4.Hot HotAdaptedHot 17409289 0.9881635 >>>> R5.Hot HotAdaptedHot 17642552 1.0818144 >>>> R9.Hot ColdAdaptedHot 20010974 0.8621807 >>>> R10.Hot ColdAdaptedHot 14064143 0.8932791 >>>> R4.Cold HotAdaptedCold 11968317 1.0061084 >>>> R5.Cold HotAdaptedCold 11072832 1.0523857 >>>> R9.Cold ColdAdaptedCold 22386103 1.0520949 >>>> R10.Cold ColdAdaptedCold 17408532 1.0903311 >>>> >>>> >>>> As you can see, R4 and R5 are replicates of the same biological group >>>> (Hot >>>> adapted), and the same is true for R9 and R10 (Cold adapted). >>>> >>>> I am interested in measuring for each gene its expression variability >>>> within a biological group (at each temperature) to discern genes that >>>> might >>>> be tightly regulated (or under stabilizing selection). The question in >>>> particular is: How can I get tagwise dispersion values for the pairs >>>> (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold + >>>> R10.Cold). I assume that the square root of each tagwise dispersion value >>>> can be interpreted as the expression variance of the corresponding gene >>>> (i.e., biological variation), as I understood from the edgeR manual. Am I >>>> correct? >>>> >>>> I tried to calculate it like this: >>>> >>>> R4.R5.HC = edgeR_expressed_genes[,1:2] >>>> #I tell edgeR there is only one factor, two replicates >>>> group = factor(c("HC", "HC")) >>>> Hot.Hot = DGEList(counts = R4.R5.HC, group = group) >>>> Hot.Hot = calcNormFactors(Hot.Hot) >>>> Hot.Hot = estimateCommonDisp(Hot.Hot) >>>> Hot.Hot = estimateTagwiseDisp(Hot.Hot) >>>> >>>> (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold + >>>> R10.Cold)). >>>> >>>> What I don't understand is why I just got 20 different dispersion values >>>> for all genes: >>>> >>>> dim(table(Hot.Hot$tagwise.**dispersion)) >>>> [1] 20 >>>> >>>> However, when I use the d.GLM dataset (i.e., the 8 samples for the 2x2 >>>> factor design) I get one different dispersion value for each gene: >>>> >>>> dim(table(d.GLM1$tagwise.**dispersion)) >>>>> >>>> [1] 9418 >>>> >>>> >>>> Why is this? >>>> >>>> Can I get gene expression variability in a better way to fulfill my aim? >>>> >>>> >>>> Thank you very much, >>>> Miguel Gallach >>>> >> >> -- >> Miguel Gallach >> Institut f?r Populationsgenetik >> Veterin?rmedizinische Universit?t Wien >> Josef Baumann Gasse 1 >> 1210 Wien >> Austria >> ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}

ADD REPLY • link 13.3 years ago Gordon Smyth 52k