Hi List,
I am analyzing my RNA-Seq data with edgeR. The next is my experimental
design:
d.GLM
An object of class "DGEList"
$samples
group lib.size norm.factors
R4.Hot HotAdaptedHot 17409289 0.9881635
R5.Hot HotAdaptedHot 17642552 1.0818144
R9.Hot ColdAdaptedHot 20010974 0.8621807
R10.Hot ColdAdaptedHot 14064143 0.8932791
R4.Cold HotAdaptedCold 11968317 1.0061084
R5.Cold HotAdaptedCold 11072832 1.0523857
R9.Cold ColdAdaptedCold 22386103 1.0520949
R10.Cold ColdAdaptedCold 17408532 1.0903311
As you can see, R4 and R5 are replicates of the same biological group
(Hot
adapted), and the same is true for R9 and R10 (Cold adapted).
I am interested in measuring for each gene its expression variability
within a biological group (at each temperature) to discern genes that
might
be tightly regulated (or under stabilizing selection). The question in
particular is: How can I get tagwise dispersion values for the pairs
(R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold +
R10.Cold). I assume that the square root of each tagwise dispersion
value
can be interpreted as the expression variance of the corresponding
gene
(i.e., biological variation), as I understood from the edgeR manual.
Am I
correct?
I tried to calculate it like this:
R4.R5.HC = edgeR_expressed_genes[,1:2]
#I tell edgeR there is only one factor, two replicates
group = factor(c("HC", "HC"))
Hot.Hot = DGEList(counts = R4.R5.HC, group = group)
Hot.Hot = calcNormFactors(Hot.Hot)
Hot.Hot = estimateCommonDisp(Hot.Hot)
Hot.Hot = estimateTagwiseDisp(Hot.Hot)
(and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold +
R10.Cold)).
What I don't understand is why I just got 20 different dispersion
values
for all genes:
dim(table(Hot.Hot$tagwise.dispersion))
[1] 20
However, when I use the d.GLM dataset (i.e., the 8 samples for the 2x2
factor design) I get one different dispersion value for each gene:
> dim(table(d.GLM1$tagwise.dispersion))
[1] 9418
Why is this?
Can I get gene expression variability in a better way to fulfill my
aim?
Thank you very much,
Miguel Gallach
[[alternative HTML version deleted]]
Dear Miguel,
What you are doing seems correct. Although of course expecting to get
good estimates of genewise dispersions from just two libraries (one
degree
of freedom) is a bit optimistic. edgeR tries to do the best that can
be
done.
The edgeR manual tells you that the sqrt(dispersion) is the biological
coefficient of variation. Coefficient of variation means sd/mean
rather
than variance. It is a more appropriate measure of variability than
the
standard deviation for quantities that are strictly positive.
The reason why estimateTagwiseDisp() returns a limited number of
distinct
dispersions is that it maximizes the tagwise dispersions on a grid of
200
possible dispersion values. estimateGLMTagwiseDisp() does something
similar, but adds an extra refinement step in which it interpolates a
cubic spline through the grid values and maximizes the spline. Hence
the
dispersion values from estimateTagwiseDisp() are taken from a
(largish)
set of preset values whereas those from estimateGLMTagwiseDisp() are
always different.
This has no major impact I think on a practical analysis.
Nevertheless we
have modified estimateTagwiseDisp() on Bioc devel to work like
estimateGLMTagwiseDisp(), so in future they with behave in a directly
comparable way.
Please give sessionInfo() output so that we can see what versions of
the
package you are using.
Best wishes
Gordon
> Date: Mon, 2 Jan 2012 13:40:59 +0100
> From: Miguel Gallach <miguel.gallach at="" vetmeduni.ac.at="">
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR -- gene expression variability
>
> Hi List,
>
> I am analyzing my RNA-Seq data with edgeR. The next is my
experimental
> design:
>
>
> d.GLM
> An object of class "DGEList"
> $samples
> group lib.size norm.factors
> R4.Hot HotAdaptedHot 17409289 0.9881635
> R5.Hot HotAdaptedHot 17642552 1.0818144
> R9.Hot ColdAdaptedHot 20010974 0.8621807
> R10.Hot ColdAdaptedHot 14064143 0.8932791
> R4.Cold HotAdaptedCold 11968317 1.0061084
> R5.Cold HotAdaptedCold 11072832 1.0523857
> R9.Cold ColdAdaptedCold 22386103 1.0520949
> R10.Cold ColdAdaptedCold 17408532 1.0903311
>
>
> As you can see, R4 and R5 are replicates of the same biological
group (Hot
> adapted), and the same is true for R9 and R10 (Cold adapted).
>
> I am interested in measuring for each gene its expression
variability
> within a biological group (at each temperature) to discern genes
that might
> be tightly regulated (or under stabilizing selection). The question
in
> particular is: How can I get tagwise dispersion values for the pairs
> (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold
+
> R10.Cold). I assume that the square root of each tagwise dispersion
value
> can be interpreted as the expression variance of the corresponding
gene
> (i.e., biological variation), as I understood from the edgeR manual.
Am I
> correct?
>
> I tried to calculate it like this:
>
> R4.R5.HC = edgeR_expressed_genes[,1:2]
> #I tell edgeR there is only one factor, two replicates
> group = factor(c("HC", "HC"))
> Hot.Hot = DGEList(counts = R4.R5.HC, group = group)
> Hot.Hot = calcNormFactors(Hot.Hot)
> Hot.Hot = estimateCommonDisp(Hot.Hot)
> Hot.Hot = estimateTagwiseDisp(Hot.Hot)
>
> (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold), (R9.Cold
+
> R10.Cold)).
>
> What I don't understand is why I just got 20 different dispersion
values
> for all genes:
>
> dim(table(Hot.Hot$tagwise.dispersion))
> [1] 20
>
> However, when I use the d.GLM dataset (i.e., the 8 samples for the
2x2
> factor design) I get one different dispersion value for each gene:
>
>> dim(table(d.GLM1$tagwise.dispersion))
> [1] 9418
>
>
> Why is this?
>
> Can I get gene expression variability in a better way to fulfill my
aim?
>
>
> Thank you very much,
> Miguel Gallach
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Dear Gordon,
thanks so much for your answer.
Here you have the version info:
sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] limma_3.10.0 edgeR_2.4.1
loaded via a namespace (and not attached):
[1] tools_2.14.0
I understand the problem of having only two replicates, but is the
best I
can have. However, let me ask you another question: I found a negative
correlation between expression level and sqrt(dispersion). I think
this is
kind of logical, so I just "normalized" the data by dividing
sqrt(dispersion)/expression. However, I did this thinking that
sqrt(dispersion) was a kind of s.d. But now, since you tell me that
sqrt(dispersion) is equivalent to sd/mean, I am not sure my
normalization
is appropriate (I mean, I am dividing by mean express. twice.) Is my
interpretation correct?
Thanks again,
Miguel
On Wed, Jan 4, 2012 at 1:04 AM, Gordon K Smyth <smyth@wehi.edu.au>
wrote:
> Dear Miguel,
>
> What you are doing seems correct. Although of course expecting to
get
> good estimates of genewise dispersions from just two libraries (one
degree
> of freedom) is a bit optimistic. edgeR tries to do the best that
can be
> done.
>
> The edgeR manual tells you that the sqrt(dispersion) is the
biological
> coefficient of variation. Coefficient of variation means sd/mean
rather
> than variance. It is a more appropriate measure of variability than
the
> standard deviation for quantities that are strictly positive.
>
> The reason why estimateTagwiseDisp() returns a limited number of
distinct
> dispersions is that it maximizes the tagwise dispersions on a grid
of 200
> possible dispersion values. estimateGLMTagwiseDisp() does something
> similar, but adds an extra refinement step in which it interpolates
a cubic
> spline through the grid values and maximizes the spline. Hence the
> dispersion values from estimateTagwiseDisp() are taken from a
(largish) set
> of preset values whereas those from estimateGLMTagwiseDisp() are
always
> different.
>
> This has no major impact I think on a practical analysis.
Nevertheless we
> have modified estimateTagwiseDisp() on Bioc devel to work like
> estimateGLMTagwiseDisp(), so in future they with behave in a
directly
> comparable way.
>
> Please give sessionInfo() output so that we can see what versions of
the
> package you are using.
>
> Best wishes
> Gordon
>
> Date: Mon, 2 Jan 2012 13:40:59 +0100
>> From: Miguel Gallach
<miguel.gallach@vetmeduni.ac.**at<miguel.gallach@vetmeduni.ac.at>
>> >
>> To: bioconductor@r-project.org
>> Subject: [BioC] edgeR -- gene expression variability
>>
>> Hi List,
>>
>> I am analyzing my RNA-Seq data with edgeR. The next is my
experimental
>> design:
>>
>>
>> d.GLM
>> An object of class "DGEList"
>> $samples
>> group lib.size norm.factors
>> R4.Hot HotAdaptedHot 17409289 0.9881635
>> R5.Hot HotAdaptedHot 17642552 1.0818144
>> R9.Hot ColdAdaptedHot 20010974 0.8621807
>> R10.Hot ColdAdaptedHot 14064143 0.8932791
>> R4.Cold HotAdaptedCold 11968317 1.0061084
>> R5.Cold HotAdaptedCold 11072832 1.0523857
>> R9.Cold ColdAdaptedCold 22386103 1.0520949
>> R10.Cold ColdAdaptedCold 17408532 1.0903311
>>
>>
>> As you can see, R4 and R5 are replicates of the same biological
group (Hot
>> adapted), and the same is true for R9 and R10 (Cold adapted).
>>
>> I am interested in measuring for each gene its expression
variability
>> within a biological group (at each temperature) to discern genes
that
>> might
>> be tightly regulated (or under stabilizing selection). The question
in
>> particular is: How can I get tagwise dispersion values for the
pairs
>> (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold),
(R9.Cold +
>> R10.Cold). I assume that the square root of each tagwise dispersion
value
>> can be interpreted as the expression variance of the corresponding
gene
>> (i.e., biological variation), as I understood from the edgeR
manual. Am I
>> correct?
>>
>> I tried to calculate it like this:
>>
>> R4.R5.HC = edgeR_expressed_genes[,1:2]
>> #I tell edgeR there is only one factor, two replicates
>> group = factor(c("HC", "HC"))
>> Hot.Hot = DGEList(counts = R4.R5.HC, group = group)
>> Hot.Hot = calcNormFactors(Hot.Hot)
>> Hot.Hot = estimateCommonDisp(Hot.Hot)
>> Hot.Hot = estimateTagwiseDisp(Hot.Hot)
>>
>> (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold),
(R9.Cold +
>> R10.Cold)).
>>
>> What I don't understand is why I just got 20 different dispersion
values
>> for all genes:
>>
>> dim(table(Hot.Hot$tagwise.**dispersion))
>> [1] 20
>>
>> However, when I use the d.GLM dataset (i.e., the 8 samples for the
2x2
>> factor design) I get one different dispersion value for each gene:
>>
>> dim(table(d.GLM1$tagwise.**dispersion))
>>>
>> [1] 9418
>>
>>
>> Why is this?
>>
>> Can I get gene expression variability in a better way to fulfill my
aim?
>>
>>
>> Thank you very much,
>> Miguel Gallach
>>
>
> ______________________________**______________________________**____
______
> The information in this email is confidential and
inte...{{dropped:20}}
Sorry again Gordon,
In addition to the previous question, what is the unit of dispersion.
I
mean, the dispersion is calculated for the logCon, Con or counts? This
should be important if I want to calculate confidence intervals,
right?
In addition, why logCon != log2(Conc)? This happens when I apply
myself the
log2 (Conc), which is not exactly equal to the logCon provided by
edgeR.
Sorry for being so picky, but I really want to understand where do the
data
come from?
Many thanks again and all the best,
Miguel
On Wed, Jan 4, 2012 at 9:06 AM, Miguel Gallach <
miguel.gallach@vetmeduni.ac.at> wrote:
> Dear Gordon,
>
> thanks so much for your answer.
>
> Here you have the version info:
>
> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] splines stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] limma_3.10.0 edgeR_2.4.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.14.0
>
>
> I understand the problem of having only two replicates, but is the
best I
> can have. However, let me ask you another question: I found a
negative
> correlation between expression level and sqrt(dispersion). I think
this is
> kind of logical, so I just "normalized" the data by dividing
> sqrt(dispersion)/expression. However, I did this thinking that
> sqrt(dispersion) was a kind of s.d. But now, since you tell me that
> sqrt(dispersion) is equivalent to sd/mean, I am not sure my
normalization
> is appropriate (I mean, I am dividing by mean express. twice.) Is my
> interpretation correct?
>
>
> Thanks again,
> Miguel
>
>
>
>
> On Wed, Jan 4, 2012 at 1:04 AM, Gordon K Smyth <smyth@wehi.edu.au>
wrote:
>
>> Dear Miguel,
>>
>> What you are doing seems correct. Although of course expecting to
get
>> good estimates of genewise dispersions from just two libraries (one
degree
>> of freedom) is a bit optimistic. edgeR tries to do the best that
can be
>> done.
>>
>> The edgeR manual tells you that the sqrt(dispersion) is the
biological
>> coefficient of variation. Coefficient of variation means sd/mean
rather
>> than variance. It is a more appropriate measure of variability
than the
>> standard deviation for quantities that are strictly positive.
>>
>> The reason why estimateTagwiseDisp() returns a limited number of
distinct
>> dispersions is that it maximizes the tagwise dispersions on a grid
of 200
>> possible dispersion values. estimateGLMTagwiseDisp() does
something
>> similar, but adds an extra refinement step in which it interpolates
a cubic
>> spline through the grid values and maximizes the spline. Hence the
>> dispersion values from estimateTagwiseDisp() are taken from a
(largish) set
>> of preset values whereas those from estimateGLMTagwiseDisp() are
always
>> different.
>>
>> This has no major impact I think on a practical analysis.
Nevertheless
>> we have modified estimateTagwiseDisp() on Bioc devel to work like
>> estimateGLMTagwiseDisp(), so in future they with behave in a
directly
>> comparable way.
>>
>> Please give sessionInfo() output so that we can see what versions
of the
>> package you are using.
>>
>> Best wishes
>> Gordon
>>
>> Date: Mon, 2 Jan 2012 13:40:59 +0100
>>> From: Miguel Gallach
<miguel.gallach@vetmeduni.ac.**at<miguel.gallach@vetmeduni.ac.at>
>>> >
>>> To: bioconductor@r-project.org
>>> Subject: [BioC] edgeR -- gene expression variability
>>>
>>> Hi List,
>>>
>>> I am analyzing my RNA-Seq data with edgeR. The next is my
experimental
>>> design:
>>>
>>>
>>> d.GLM
>>> An object of class "DGEList"
>>> $samples
>>> group lib.size norm.factors
>>> R4.Hot HotAdaptedHot 17409289 0.9881635
>>> R5.Hot HotAdaptedHot 17642552 1.0818144
>>> R9.Hot ColdAdaptedHot 20010974 0.8621807
>>> R10.Hot ColdAdaptedHot 14064143 0.8932791
>>> R4.Cold HotAdaptedCold 11968317 1.0061084
>>> R5.Cold HotAdaptedCold 11072832 1.0523857
>>> R9.Cold ColdAdaptedCold 22386103 1.0520949
>>> R10.Cold ColdAdaptedCold 17408532 1.0903311
>>>
>>>
>>> As you can see, R4 and R5 are replicates of the same biological
group
>>> (Hot
>>> adapted), and the same is true for R9 and R10 (Cold adapted).
>>>
>>> I am interested in measuring for each gene its expression
variability
>>> within a biological group (at each temperature) to discern genes
that
>>> might
>>> be tightly regulated (or under stabilizing selection). The
question in
>>> particular is: How can I get tagwise dispersion values for the
pairs
>>> (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold),
(R9.Cold +
>>> R10.Cold). I assume that the square root of each tagwise
dispersion value
>>> can be interpreted as the expression variance of the corresponding
gene
>>> (i.e., biological variation), as I understood from the edgeR
manual. Am I
>>> correct?
>>>
>>> I tried to calculate it like this:
>>>
>>> R4.R5.HC = edgeR_expressed_genes[,1:2]
>>> #I tell edgeR there is only one factor, two replicates
>>> group = factor(c("HC", "HC"))
>>> Hot.Hot = DGEList(counts = R4.R5.HC, group = group)
>>> Hot.Hot = calcNormFactors(Hot.Hot)
>>> Hot.Hot = estimateCommonDisp(Hot.Hot)
>>> Hot.Hot = estimateTagwiseDisp(Hot.Hot)
>>>
>>> (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold),
(R9.Cold +
>>> R10.Cold)).
>>>
>>> What I don't understand is why I just got 20 different dispersion
values
>>> for all genes:
>>>
>>> dim(table(Hot.Hot$tagwise.**dispersion))
>>> [1] 20
>>>
>>> However, when I use the d.GLM dataset (i.e., the 8 samples for the
2x2
>>> factor design) I get one different dispersion value for each gene:
>>>
>>> dim(table(d.GLM1$tagwise.**dispersion))
>>>>
>>> [1] 9418
>>>
>>>
>>> Why is this?
>>>
>>> Can I get gene expression variability in a better way to fulfill
my aim?
>>>
>>>
>>> Thank you very much,
>>> Miguel Gallach
>>>
>>
>> ______________________________**______________________________**
>> __________
>> The information in this email is confidential and intended solely
for the
>> addressee.
>> You must not disclose, forward, print or use it without the
permission of
>> the sender.
>> ______________________________**______________________________**
>> __________
>>
>
>
>
> --
> Miguel Gallach
> Institut für Populationsgenetik
> Veterinärmedizinische Universität Wien
> Josef Baumann Gasse 1
> 1210 Wien
> Austria
>
--
Miguel Gallach
Institut für Populationsgenetik
Veterinärmedizinische Universität Wien
Josef Baumann Gasse 1
1210 Wien
Austria
[[alternative HTML version deleted]]
Dear Miguel,
I'm afraid that I don't understand your questions. There is no
quantity
in edgeR called "Con", there is no sensible way that I know of to
normalize counts using the dispersion, nor any need to do so, and I do
not
follow for what quantity you are trying to obtain a confidence
interval.
I would prefer that you did a little more background reading before
sending more questions. The three papers by Mark Robinson and myself
about edgeR might help, and there's plenty of public documentation on
the
coefficient of variation:
http://en.wikipedia.org/wiki/Coefficient_of_variation
The dispersion is a coefficient of variation is always dimensionless,
because CV=sd/mean and the dimensions of the sd and the mean cancel
out.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Tel: (03) 9345 2326, Fax (03) 9347 0852,
smyth at wehi.edu.au
http://www.wehi.edu.auhttp://www.statsci.org/smyth
On Wed, 4 Jan 2012, Miguel Gallach wrote:
> Sorry again Gordon,
>
> In addition to the previous question, what is the unit of
dispersion. I
> mean, the dispersion is calculated for the logCon, Con or counts?
This
> should be important if I want to calculate confidence intervals,
right?
> In addition, why logCon != log2(Conc)? This happens when I apply
myself the
> log2 (Conc), which is not exactly equal to the logCon provided by
edgeR.
> Sorry for being so picky, but I really want to understand where do
the data
> come from?
>
>
> Many thanks again and all the best,
> Miguel
>
> On Wed, Jan 4, 2012 at 9:06 AM, Miguel Gallach <
> miguel.gallach at vetmeduni.ac.at> wrote:
>
>> Dear Gordon,
>>
>> thanks so much for your answer.
>>
>> Here you have the version info:
>>
>> sessionInfo()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] splines stats graphics grDevices utils datasets
methods
>> [8] base
>>
>> other attached packages:
>> [1] limma_3.10.0 edgeR_2.4.1
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.14.0
>>
>>
>> I understand the problem of having only two replicates, but is the
best I
>> can have. However, let me ask you another question: I found a
negative
>> correlation between expression level and sqrt(dispersion). I think
this is
>> kind of logical, so I just "normalized" the data by dividing
>> sqrt(dispersion)/expression. However, I did this thinking that
>> sqrt(dispersion) was a kind of s.d. But now, since you tell me that
>> sqrt(dispersion) is equivalent to sd/mean, I am not sure my
normalization
>> is appropriate (I mean, I am dividing by mean express. twice.) Is
my
>> interpretation correct?
>>
>>
>> Thanks again,
>> Miguel
>>
>>
>>
>>
>> On Wed, Jan 4, 2012 at 1:04 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote:
>>
>>> Dear Miguel,
>>>
>>> What you are doing seems correct. Although of course expecting to
get
>>> good estimates of genewise dispersions from just two libraries
(one degree
>>> of freedom) is a bit optimistic. edgeR tries to do the best that
can be
>>> done.
>>>
>>> The edgeR manual tells you that the sqrt(dispersion) is the
biological
>>> coefficient of variation. Coefficient of variation means sd/mean
rather
>>> than variance. It is a more appropriate measure of variability
than the
>>> standard deviation for quantities that are strictly positive.
>>>
>>> The reason why estimateTagwiseDisp() returns a limited number of
distinct
>>> dispersions is that it maximizes the tagwise dispersions on a grid
of 200
>>> possible dispersion values. estimateGLMTagwiseDisp() does
something
>>> similar, but adds an extra refinement step in which it
interpolates a cubic
>>> spline through the grid values and maximizes the spline. Hence
the
>>> dispersion values from estimateTagwiseDisp() are taken from a
(largish) set
>>> of preset values whereas those from estimateGLMTagwiseDisp() are
always
>>> different.
>>>
>>> This has no major impact I think on a practical analysis.
Nevertheless
>>> we have modified estimateTagwiseDisp() on Bioc devel to work like
>>> estimateGLMTagwiseDisp(), so in future they with behave in a
directly
>>> comparable way.
>>>
>>> Please give sessionInfo() output so that we can see what versions
of the
>>> package you are using.
>>>
>>> Best wishes
>>> Gordon
>>>
>>> Date: Mon, 2 Jan 2012 13:40:59 +0100
>>>> From: Miguel Gallach <miguel.gallach at="" vetmeduni.ac.**at<miguel.gallach="" at="" vetmeduni.ac.at="">
>>>>>
>>>> To: bioconductor at r-project.org
>>>> Subject: [BioC] edgeR -- gene expression variability
>>>>
>>>> Hi List,
>>>>
>>>> I am analyzing my RNA-Seq data with edgeR. The next is my
experimental
>>>> design:
>>>>
>>>>
>>>> d.GLM
>>>> An object of class "DGEList"
>>>> $samples
>>>> group lib.size norm.factors
>>>> R4.Hot HotAdaptedHot 17409289 0.9881635
>>>> R5.Hot HotAdaptedHot 17642552 1.0818144
>>>> R9.Hot ColdAdaptedHot 20010974 0.8621807
>>>> R10.Hot ColdAdaptedHot 14064143 0.8932791
>>>> R4.Cold HotAdaptedCold 11968317 1.0061084
>>>> R5.Cold HotAdaptedCold 11072832 1.0523857
>>>> R9.Cold ColdAdaptedCold 22386103 1.0520949
>>>> R10.Cold ColdAdaptedCold 17408532 1.0903311
>>>>
>>>>
>>>> As you can see, R4 and R5 are replicates of the same biological
group
>>>> (Hot
>>>> adapted), and the same is true for R9 and R10 (Cold adapted).
>>>>
>>>> I am interested in measuring for each gene its expression
variability
>>>> within a biological group (at each temperature) to discern genes
that
>>>> might
>>>> be tightly regulated (or under stabilizing selection). The
question in
>>>> particular is: How can I get tagwise dispersion values for the
pairs
>>>> (R4.Hot + R5.Hot), (R9.Hot + R10.Hot), (R4.Cold + R5.Cold),
(R9.Cold +
>>>> R10.Cold). I assume that the square root of each tagwise
dispersion value
>>>> can be interpreted as the expression variance of the
corresponding gene
>>>> (i.e., biological variation), as I understood from the edgeR
manual. Am I
>>>> correct?
>>>>
>>>> I tried to calculate it like this:
>>>>
>>>> R4.R5.HC = edgeR_expressed_genes[,1:2]
>>>> #I tell edgeR there is only one factor, two replicates
>>>> group = factor(c("HC", "HC"))
>>>> Hot.Hot = DGEList(counts = R4.R5.HC, group = group)
>>>> Hot.Hot = calcNormFactors(Hot.Hot)
>>>> Hot.Hot = estimateCommonDisp(Hot.Hot)
>>>> Hot.Hot = estimateTagwiseDisp(Hot.Hot)
>>>>
>>>> (and similarly for (R9.Hot + R10.Hot), (R4.Cold + R5.Cold),
(R9.Cold +
>>>> R10.Cold)).
>>>>
>>>> What I don't understand is why I just got 20 different dispersion
values
>>>> for all genes:
>>>>
>>>> dim(table(Hot.Hot$tagwise.**dispersion))
>>>> [1] 20
>>>>
>>>> However, when I use the d.GLM dataset (i.e., the 8 samples for
the 2x2
>>>> factor design) I get one different dispersion value for each
gene:
>>>>
>>>> dim(table(d.GLM1$tagwise.**dispersion))
>>>>>
>>>> [1] 9418
>>>>
>>>>
>>>> Why is this?
>>>>
>>>> Can I get gene expression variability in a better way to fulfill
my aim?
>>>>
>>>>
>>>> Thank you very much,
>>>> Miguel Gallach
>>>>
>>
>> --
>> Miguel Gallach
>> Institut f?r Populationsgenetik
>> Veterin?rmedizinische Universit?t Wien
>> Josef Baumann Gasse 1
>> 1210 Wien
>> Austria
>>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:5}}