Hi,
This is a conceptual question related to microarray, instead of the
usage of
any Bioconductor package. I apologize if this bothers anyone.
I am struggling to understand the concept of differential expression
in
terms of its resources (whether it is technical or biological).
Suppose I
have an experiment with two groups (healthy vs. disease) and try to
find
some differentially expressed genes, take two genes for example, both
of
them are differentially expressed (DE) between healthy and disease.
Gene A has present detection call for all the samples under study (but
the
detection call p-value in the healthy group is in the order of 1e-2 ~
1e-3,
the detection call p-value in the disease group is much more
significant
(say, 1e-10)).
Gene B has 50% present call in healthy while 100% present call in
cancer.
My question is what's the correct interpretation in terms of whether
the
differential expression is due to technical or biological? Are they
both DE
due to technical, or A is DE due to biological and B is due to
technical, or
they are both DE due to biological?
Thanks a bunch,
-Jack
[[alternative HTML version deleted]]
On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com>
wrote:
> Hi,
>
> This is a conceptual question related to microarray, instead of the
usage
> of
> any Bioconductor package. I apologize if this bothers anyone.
>
> I am struggling to understand the concept of differential expression
in
> terms of its resources (whether it is technical or biological).
Suppose I
> have an experiment with two groups (healthy vs. disease) and try to
find
> some differentially expressed genes, take two genes for example,
both of
> them are differentially expressed (DE) between healthy and disease.
>
> Gene A has present detection call for all the samples under study
(but the
> detection call p-value in the healthy group is in the order of 1e-2
~ 1e-3,
> the detection call p-value in the disease group is much more
significant
> (say, 1e-10)).
> Gene B has 50% present call in healthy while 100% present call in
cancer.
>
>
This sounds like a good candidate. The gene appears to be
differentially
expressed.
> My question is what's the correct interpretation in terms of whether
the
> differential expression is due to technical or biological? Are they
both DE
> due to technical, or A is DE due to biological and B is due to
technical,
> or
> they are both DE due to biological?
>
>
This is not a question that can be answered with certainty. However,
if you
used biological replicates, the usual interpretation is that the DE
could be
due to true biological differences (though we cannot prove that
without
further experiments).
Sean
[[alternative HTML version deleted]]
Sean,
Thanks for your email. You mean Gene B sounds like a good candidate?
Could
you be more specific on the usage of biological replicates? I don't
seem to
get the connection to the question I asked.
Thanks,
-Jack
On Thu, Sep 23, 2010 at 5:26 PM, Sean Davis <sdavis2@mail.nih.gov>
wrote:
>
>
> On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com>
wrote:
>
>> Hi,
>>
>> This is a conceptual question related to microarray, instead of the
usage
>> of
>> any Bioconductor package. I apologize if this bothers anyone.
>>
>> I am struggling to understand the concept of differential
expression in
>> terms of its resources (whether it is technical or biological).
Suppose I
>> have an experiment with two groups (healthy vs. disease) and try to
find
>> some differentially expressed genes, take two genes for example,
both of
>> them are differentially expressed (DE) between healthy and disease.
>>
>> Gene A has present detection call for all the samples under study
(but the
>> detection call p-value in the healthy group is in the order of 1e-2
~
>> 1e-3,
>> the detection call p-value in the disease group is much more
significant
>> (say, 1e-10)).
>> Gene B has 50% present call in healthy while 100% present call in
cancer.
>>
>>
> This sounds like a good candidate. The gene appears to be
differentially
> expressed.
>
>
>> My question is what's the correct interpretation in terms of
whether the
>> differential expression is due to technical or biological? Are they
both
>> DE
>> due to technical, or A is DE due to biological and B is due to
technical,
>> or
>> they are both DE due to biological?
>>
>>
> This is not a question that can be answered with certainty.
However, if
> you used biological replicates, the usual interpretation is that the
DE
> could be due to true biological differences (though we cannot prove
that
> without further experiments).
>
> Sean
>
>
>
[[alternative HTML version deleted]]
Hello Jack,
I am not sure what you mean by:
"....p-value in the disease group is much more significant
(say, 1e-10))."
If you mean to use the *magnitude* of p-values as an indicator for the
*strength* of evidence against (or for) the null, then it it can lead
to
trouble ... Theoretically speaking, it is incorrect to compare two
p-values
and make conclusions about the strength of evidence against (or for)
the
null (in your case *more* differential expression of Gene A in Disease
compared to Control ).
Thanks and Best Regards,
S.
On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com>
wrote:
> Hi,
>
> This is a conceptual question related to microarray, instead of the
usage
> of
> any Bioconductor package. I apologize if this bothers anyone.
>
> I am struggling to understand the concept of differential expression
in
> terms of its resources (whether it is technical or biological).
Suppose I
> have an experiment with two groups (healthy vs. disease) and try to
find
> some differentially expressed genes, take two genes for example,
both of
> them are differentially expressed (DE) between healthy and disease.
>
> Gene A has present detection call for all the samples under study
(but the
> detection call p-value in the healthy group is in the order of 1e-2
~ 1e-3,
> the detection call p-value in the disease group is much more
significant
> (say, 1e-10)).
> Gene B has 50% present call in healthy while 100% present call in
cancer.
>
> My question is what's the correct interpretation in terms of whether
the
> differential expression is due to technical or biological? Are they
both DE
> due to technical, or A is DE due to biological and B is due to
technical,
> or
> they are both DE due to biological?
>
> Thanks a bunch,
>
> -Jack
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
Sunny,
What I meant by "p-value in the disease group is much more
significant" is
that the detection call p-value in the disease group are more
significant
than healthy group, which means the signal in the disease group is
more
significantly above background noise than healthy group.
I am not sure I understand what you mean by "the null", you mean the
background (or the mismatch probes)? When you say "strength of
evidence", do
you mean the strength of the signal? Could you be more specific on
what
"trouble" means?
Sorry that I get lost here,
-Jack
On Thu, Sep 23, 2010 at 5:43 PM, Sunny Srivastava
<research.baba@gmail.com>wrote:
> Hello Jack,
> I am not sure what you mean by:
>
> "....p-value in the disease group is much more significant
> (say, 1e-10))."
>
> If you mean to use the *magnitude* of p-values as an indicator for
the
> *strength* of evidence against (or for) the null, then it it can
lead to
> trouble ... Theoretically speaking, it is incorrect to compare two
p-values
> and make conclusions about the strength of evidence against (or for)
the
> null (in your case *more* differential expression of Gene A in
Disease
> compared to Control ).
>
> Thanks and Best Regards,
> S.
>
> On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com>
wrote:
>
>> Hi,
>>
>> This is a conceptual question related to microarray, instead of the
usage
>> of
>> any Bioconductor package. I apologize if this bothers anyone.
>>
>> I am struggling to understand the concept of differential
expression in
>> terms of its resources (whether it is technical or biological).
Suppose I
>> have an experiment with two groups (healthy vs. disease) and try to
find
>> some differentially expressed genes, take two genes for example,
both of
>> them are differentially expressed (DE) between healthy and disease.
>>
>> Gene A has present detection call for all the samples under study
(but the
>> detection call p-value in the healthy group is in the order of 1e-2
~
>> 1e-3,
>> the detection call p-value in the disease group is much more
significant
>> (say, 1e-10)).
>> Gene B has 50% present call in healthy while 100% present call in
cancer.
>>
>> My question is what's the correct interpretation in terms of
whether the
>> differential expression is due to technical or biological? Are they
both
>> DE
>> due to technical, or A is DE due to biological and B is due to
technical,
>> or
>> they are both DE due to biological?
>>
>> Thanks a bunch,
>>
>> -Jack
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
[[alternative HTML version deleted]]
Hello Jack,
I am sorry for writing a statistical statement without knowing your
background.
When I say "null", I mean the null hypothesis (it is just a statement
that
you reject using p-values) for the expression of a gene (A or B in
your
case). A null hypothesis is a status quo statement about the
expression of
any gene. In your case this is equivalent to saying that the gene (A
or B)
is not differentially expressed when you compare a person with disease
to a
healthy person. And you reject the null (hypothesis) if p-value is <
0.01.
But most importantly, magnitude of p-values should not be used as an
indicator for the strength of signal or indicator for the strength of
differential expression of a gene in diseased compared to a healthy
person.
This can lead to wrong conclusions about a gene A being "more"
differentially expressed than gene B (as you say: gene A is more
significant
than gene B if pvalue for A < pvalue for B when comparing diseased and
healthy individuals), which may not be true. This is what I called
'trouble'.
sorry for the statistical jargons.
Thanks,
S.
On Fri, Sep 24, 2010 at 8:46 AM, Jack Luo <jluo.rhelp@gmail.com>
wrote:
> Sunny,
>
> What I meant by "p-value in the disease group is much more
significant" is
> that the detection call p-value in the disease group are more
significant
> than healthy group, which means the signal in the disease group is
more
> significantly above background noise than healthy group.
>
> I am not sure I understand what you mean by "the null", you mean the
> background (or the mismatch probes)? When you say "strength of
evidence", do
> you mean the strength of the signal? Could you be more specific on
what
> "trouble" means?
>
> Sorry that I get lost here,
>
> -Jack
>
>
> On Thu, Sep 23, 2010 at 5:43 PM, Sunny Srivastava
<research.baba@gmail.com> > wrote:
>
>> Hello Jack,
>> I am not sure what you mean by:
>>
>> "....p-value in the disease group is much more significant
>> (say, 1e-10))."
>>
>> If you mean to use the *magnitude* of p-values as an indicator for
the
>> *strength* of evidence against (or for) the null, then it it can
lead to
>> trouble ... Theoretically speaking, it is incorrect to compare two
p-values
>> and make conclusions about the strength of evidence against (or
for) the
>> null (in your case *more* differential expression of Gene A in
Disease
>> compared to Control ).
>>
>> Thanks and Best Regards,
>> S.
>>
>> On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com>
wrote:
>>
>>> Hi,
>>>
>>> This is a conceptual question related to microarray, instead of
the usage
>>> of
>>> any Bioconductor package. I apologize if this bothers anyone.
>>>
>>> I am struggling to understand the concept of differential
expression in
>>> terms of its resources (whether it is technical or biological).
Suppose I
>>> have an experiment with two groups (healthy vs. disease) and try
to find
>>> some differentially expressed genes, take two genes for example,
both of
>>> them are differentially expressed (DE) between healthy and
disease.
>>>
>>> Gene A has present detection call for all the samples under study
(but
>>> the
>>> detection call p-value in the healthy group is in the order of
1e-2 ~
>>> 1e-3,
>>> the detection call p-value in the disease group is much more
significant
>>> (say, 1e-10)).
>>> Gene B has 50% present call in healthy while 100% present call in
cancer.
>>>
>>> My question is what's the correct interpretation in terms of
whether the
>>> differential expression is due to technical or biological? Are
they both
>>> DE
>>> due to technical, or A is DE due to biological and B is due to
technical,
>>> or
>>> they are both DE due to biological?
>>>
>>> Thanks a bunch,
>>>
>>> -Jack
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor@stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>
[[alternative HTML version deleted]]
Hi Sunny,
Thanks for your email and clarification. I am familiar with the term
"null
hypothesis" from statistical perspective (although I am not trained as
a
statistician), what I meant to clarify is exactly what is the null. I
agree
with the caution of using p-value as a measure of the degree of
differential
expression, but that's kinda off the topic. What I really meant to ask
is
that for the two genes A and B, are they differentially expressed due
to
true biological reason or technical reason.
Anyway, thanks for your effort on this, much appreciated.
-Jack
On Fri, Sep 24, 2010 at 2:22 PM, Sunny Srivastava
<research.baba@gmail.com>wrote:
> Hello Jack,
> I am sorry for writing a statistical statement without knowing your
> background.
>
> When I say "null", I mean the null hypothesis (it is just a
statement that
> you reject using p-values) for the expression of a gene (A or B in
your
> case). A null hypothesis is a status quo statement about the
expression of
> any gene. In your case this is equivalent to saying that the gene (A
or B)
> is not differentially expressed when you compare a person with
disease to a
> healthy person. And you reject the null (hypothesis) if p-value is <
0.01.
>
> But most importantly, magnitude of p-values should not be used as an
> indicator for the strength of signal or indicator for the strength
of
> differential expression of a gene in diseased compared to a healthy
person.
> This can lead to wrong conclusions about a gene A being "more"
> differentially expressed than gene B (as you say: gene A is more
significant
> than gene B if pvalue for A < pvalue for B when comparing diseased
and
> healthy individuals), which may not be true. This is what I called
> 'trouble'.
>
> sorry for the statistical jargons.
>
> Thanks,
> S.
>
> On Fri, Sep 24, 2010 at 8:46 AM, Jack Luo <jluo.rhelp@gmail.com>
wrote:
>
>> Sunny,
>>
>> What I meant by "p-value in the disease group is much more
significant" is
>> that the detection call p-value in the disease group are more
significant
>> than healthy group, which means the signal in the disease group is
more
>> significantly above background noise than healthy group.
>>
>> I am not sure I understand what you mean by "the null", you mean
the
>> background (or the mismatch probes)? When you say "strength of
evidence", do
>> you mean the strength of the signal? Could you be more specific on
what
>> "trouble" means?
>>
>> Sorry that I get lost here,
>>
>> -Jack
>>
>>
>> On Thu, Sep 23, 2010 at 5:43 PM, Sunny Srivastava <
>> research.baba@gmail.com> wrote:
>>
>>> Hello Jack,
>>> I am not sure what you mean by:
>>>
>>> "....p-value in the disease group is much more significant
>>> (say, 1e-10))."
>>>
>>> If you mean to use the *magnitude* of p-values as an indicator for
the
>>> *strength* of evidence against (or for) the null, then it it can
lead to
>>> trouble ... Theoretically speaking, it is incorrect to compare two
p-values
>>> and make conclusions about the strength of evidence against (or
for) the
>>> null (in your case *more* differential expression of Gene A in
Disease
>>> compared to Control ).
>>>
>>> Thanks and Best Regards,
>>> S.
>>>
>>> On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com>
wrote:
>>>
>>>> Hi,
>>>>
>>>> This is a conceptual question related to microarray, instead of
the
>>>> usage of
>>>> any Bioconductor package. I apologize if this bothers anyone.
>>>>
>>>> I am struggling to understand the concept of differential
expression in
>>>> terms of its resources (whether it is technical or biological).
Suppose
>>>> I
>>>> have an experiment with two groups (healthy vs. disease) and try
to find
>>>> some differentially expressed genes, take two genes for example,
both of
>>>> them are differentially expressed (DE) between healthy and
disease.
>>>>
>>>> Gene A has present detection call for all the samples under study
(but
>>>> the
>>>> detection call p-value in the healthy group is in the order of
1e-2 ~
>>>> 1e-3,
>>>> the detection call p-value in the disease group is much more
significant
>>>> (say, 1e-10)).
>>>> Gene B has 50% present call in healthy while 100% present call in
>>>> cancer.
>>>>
>>>> My question is what's the correct interpretation in terms of
whether the
>>>> differential expression is due to technical or biological? Are
they both
>>>> DE
>>>> due to technical, or A is DE due to biological and B is due to
>>>> technical, or
>>>> they are both DE due to biological?
>>>>
>>>> Thanks a bunch,
>>>>
>>>> -Jack
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor@stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>>
>>
>
[[alternative HTML version deleted]]
Hi Jack,
On 9/23/2010 4:45 PM, Jack Luo wrote:
> Hi,
>
> This is a conceptual question related to microarray, instead of the
usage of
> any Bioconductor package. I apologize if this bothers anyone.
>
> I am struggling to understand the concept of differential expression
in
> terms of its resources (whether it is technical or biological).
Suppose I
> have an experiment with two groups (healthy vs. disease) and try to
find
> some differentially expressed genes, take two genes for example,
both of
> them are differentially expressed (DE) between healthy and disease.
>
> Gene A has present detection call for all the samples under study
(but the
> detection call p-value in the healthy group is in the order of 1e-2
~ 1e-3,
> the detection call p-value in the disease group is much more
significant
> (say, 1e-10)).
> Gene B has 50% present call in healthy while 100% present call in
cancer.
First let's backtrack and talk about P/M/A calls, and what they mean.
The statistics underlying these calls are testing whether or not the
PM
probes in aggregate appear to be different than the corresponding MM
probes in a given probeset. Others will disagree, but I think it is
incorrect to assume that an absent call means that the transcript
being
measured is absent. What it really means is that we cannot say that
the
PM probes are binding more transcript than the MM probes.
If you make the assumption that the MM probes do a good job of
measuring
background, then the absent call really means it is absent. However, a
large percentage of MM probes have higher fluorescence readings than
the
corresponding PM probe (it varies by chip, but is usually > 30%. You
can
check with your data to verify). In addition, the MM probe intensity
will increase with increasing amounts of transcript. These are two of
the reasons that Affy has abandoned the use of MM probes (more real
estate on the chip being a third), and why very few people use MAS5
for
computing expression values any more.
So I would personally caution you against interpreting these p-values
as
indicating presence or absence of the transcript.
As to your question, technical and biological variability are
completely
confounded here, so you have to set up your experiments in such a way
that the contribution from technical variability is minimized. For
instance, if you do all controls one day and diseased the next, you
cannot possibly tell if any differences were due to biology or to
technical differences. However, if you randomize sample types over
days
processed, then the technical variability (which still exists, and is
confounded with biological variability), will tend to appear as noise,
and be captured by the residual term.
Also, in my opinion there isn't any difference between the two
situations (assuming I understand situation B correctly). What I think
you are asking is this; are there any substantive differences between
a
situation where a gene is apparently unexpressed in sample A but
expressed to a certain degree in sample B and a situation where a gene
is expressed in both samples, but at a two fold (or greater) level in
B
vs A.
In my opinion, there is no difference between those scenarios. In each
situation, the gene is expressed at a much lower level in one sample
versus the other. The relative levels are unimportant, as the absolute
accuracy of our measuring device is not good.
Best,
Jim
>
> My question is what's the correct interpretation in terms of whether
the
> differential expression is due to technical or biological? Are they
both DE
> due to technical, or A is DE due to biological and B is due to
technical, or
> they are both DE due to biological?
>
> Thanks a bunch,
>
> -Jack
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues
Jim,
Thanks for your detailed explanation on this, it's really helpful. I
agree
with you that the term "present/absent" might be problematic, perhaps
a more
accurate term is reliable/unreliable. I am not sure I agree that the
technical and biological variability are completely confounded, it's
well
randomized experiment with disease/healthy status, not something like
all
disease in one day/batch/..., all healthy in another day/batch/ ....
The
last two paragraphs of your email answered my question very accurately
(that's exactly what I am asking). Sorry to bother you with another
question: do you think the difference is technical or biological? In
our
data, we have the same set of samples (say, 100 healthy vs. 100
disease) run
using two different batches (batch difference could be due to lots of
things
like reagent, hybwash...), comparing the differential expression from
one
batch to another, I found many genes that are differentially expressed
in
the 1st batch that are like gene B: higher present% call in one group
than
the other group. However, in the 2nd batch, I found lots of them lose
the
present% difference between the two groups and also goes from
differentially
expressed to non-differentially expressed (I found this for both RMA
and
MAS5), which makes me wonder the differential expression in the 1st
batch is
due to technical reasons, not biological reasons (since the biology of
the
two batches are identical because they are from the same 200 samples).
Thanks again,
-Jun
On Fri, Sep 24, 2010 at 9:40 AM, James W. MacDonald
<jmacdon@med.umich.edu>wrote:
> Hi Jack,
>
>
> On 9/23/2010 4:45 PM, Jack Luo wrote:
>
>> Hi,
>>
>> This is a conceptual question related to microarray, instead of the
usage
>> of
>> any Bioconductor package. I apologize if this bothers anyone.
>>
>> I am struggling to understand the concept of differential
expression in
>> terms of its resources (whether it is technical or biological).
Suppose I
>> have an experiment with two groups (healthy vs. disease) and try to
find
>> some differentially expressed genes, take two genes for example,
both of
>> them are differentially expressed (DE) between healthy and disease.
>>
>> Gene A has present detection call for all the samples under study
(but the
>> detection call p-value in the healthy group is in the order of 1e-2
~
>> 1e-3,
>> the detection call p-value in the disease group is much more
significant
>> (say, 1e-10)).
>> Gene B has 50% present call in healthy while 100% present call in
cancer.
>>
>
> First let's backtrack and talk about P/M/A calls, and what they
mean. The
> statistics underlying these calls are testing whether or not the PM
probes
> in aggregate appear to be different than the corresponding MM probes
in a
> given probeset. Others will disagree, but I think it is incorrect to
assume
> that an absent call means that the transcript being measured is
absent. What
> it really means is that we cannot say that the PM probes are binding
more
> transcript than the MM probes.
>
> If you make the assumption that the MM probes do a good job of
measuring
> background, then the absent call really means it is absent. However,
a large
> percentage of MM probes have higher fluorescence readings than the
> corresponding PM probe (it varies by chip, but is usually > 30%. You
can
> check with your data to verify). In addition, the MM probe intensity
will
> increase with increasing amounts of transcript. These are two of the
reasons
> that Affy has abandoned the use of MM probes (more real estate on
the chip
> being a third), and why very few people use MAS5 for computing
expression
> values any more.
>
> So I would personally caution you against interpreting these
p-values as
> indicating presence or absence of the transcript.
>
> As to your question, technical and biological variability are
completely
> confounded here, so you have to set up your experiments in such a
way that
> the contribution from technical variability is minimized. For
instance, if
> you do all controls one day and diseased the next, you cannot
possibly tell
> if any differences were due to biology or to technical differences.
However,
> if you randomize sample types over days processed, then the
technical
> variability (which still exists, and is confounded with biological
> variability), will tend to appear as noise, and be captured by the
residual
> term.
>
> Also, in my opinion there isn't any difference between the two
situations
> (assuming I understand situation B correctly). What I think you are
asking
> is this; are there any substantive differences between a situation
where a
> gene is apparently unexpressed in sample A but expressed to a
certain degree
> in sample B and a situation where a gene is expressed in both
samples, but
> at a two fold (or greater) level in B vs A.
>
> In my opinion, there is no difference between those scenarios. In
each
> situation, the gene is expressed at a much lower level in one sample
versus
> the other. The relative levels are unimportant, as the absolute
accuracy of
> our measuring device is not good.
>
> Best,
>
> Jim
>
>
>
>> My question is what's the correct interpretation in terms of
whether the
>> differential expression is due to technical or biological? Are they
both
>> DE
>> due to technical, or A is DE due to biological and B is due to
technical,
>> or
>> they are both DE due to biological?
>>
>> Thanks a bunch,
>>
>> -Jack
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should
not be
> used for urgent or sensitive issues
>
[[alternative HTML version deleted]]
On Mon, Sep 27, 2010 at 2:47 PM, Jack Luo <jluo.rhelp@gmail.com>
wrote:
> Jim,
>
> Thanks for your detailed explanation on this, it's really helpful. I
agree
> with you that the term "present/absent" might be problematic,
perhaps a
> more
> accurate term is reliable/unreliable. I am not sure I agree that the
> technical and biological variability are completely confounded, it's
well
> randomized experiment with disease/healthy status, not something
like all
> disease in one day/batch/..., all healthy in another day/batch/ ....
The
> last two paragraphs of your email answered my question very
accurately
> (that's exactly what I am asking). Sorry to bother you with another
> question: do you think the difference is technical or biological? In
our
> data, we have the same set of samples (say, 100 healthy vs. 100
disease)
> run
> using two different batches (batch difference could be due to lots
of
> things
> like reagent, hybwash...), comparing the differential expression
from one
> batch to another, I found many genes that are differentially
expressed in
> the 1st batch that are like gene B: higher present% call in one
group than
> the other group. However, in the 2nd batch, I found lots of them
lose the
> present% difference between the two groups and also goes from
> differentially
> expressed to non-differentially expressed (I found this for both RMA
and
> MAS5), which makes me wonder the differential expression in the 1st
batch
> is
> due to technical reasons, not biological reasons (since the biology
of the
> two batches are identical because they are from the same 200
samples).
>
>
Hi, Jun.
It is not unexpected for a differentially-expressed gene to show up
having a
different number of P/A calls in one group than the other. In the
ideal
case, a gene is highly expressed in one group and not expressed at all
in
the other. Jim made the point that P/A calls only roughly measure
actual
presence or absence, so take them with a grain of salt. Also, just
because
a gene has such a difference in P/A calls between groups does imply
that a
gene is differentially expressed or that it is not. The hypothesis
test
that uses the measured signal is what is usually considered when
looking at
differential expression.
Sean
> On Fri, Sep 24, 2010 at 9:40 AM, James W. MacDonald
> <jmacdon@med.umich.edu>wrote:
>
> > Hi Jack,
> >
> >
> > On 9/23/2010 4:45 PM, Jack Luo wrote:
> >
> >> Hi,
> >>
> >> This is a conceptual question related to microarray, instead of
the
> usage
> >> of
> >> any Bioconductor package. I apologize if this bothers anyone.
> >>
> >> I am struggling to understand the concept of differential
expression in
> >> terms of its resources (whether it is technical or biological).
Suppose
> I
> >> have an experiment with two groups (healthy vs. disease) and try
to find
> >> some differentially expressed genes, take two genes for example,
both of
> >> them are differentially expressed (DE) between healthy and
disease.
> >>
> >> Gene A has present detection call for all the samples under study
(but
> the
> >> detection call p-value in the healthy group is in the order of
1e-2 ~
> >> 1e-3,
> >> the detection call p-value in the disease group is much more
significant
> >> (say, 1e-10)).
> >> Gene B has 50% present call in healthy while 100% present call in
> cancer.
> >>
> >
> > First let's backtrack and talk about P/M/A calls, and what they
mean. The
> > statistics underlying these calls are testing whether or not the
PM
> probes
> > in aggregate appear to be different than the corresponding MM
probes in a
> > given probeset. Others will disagree, but I think it is incorrect
to
> assume
> > that an absent call means that the transcript being measured is
absent.
> What
> > it really means is that we cannot say that the PM probes are
binding more
> > transcript than the MM probes.
> >
> > If you make the assumption that the MM probes do a good job of
measuring
> > background, then the absent call really means it is absent.
However, a
> large
> > percentage of MM probes have higher fluorescence readings than the
> > corresponding PM probe (it varies by chip, but is usually > 30%.
You can
> > check with your data to verify). In addition, the MM probe
intensity will
> > increase with increasing amounts of transcript. These are two of
the
> reasons
> > that Affy has abandoned the use of MM probes (more real estate on
the
> chip
> > being a third), and why very few people use MAS5 for computing
expression
> > values any more.
> >
> > So I would personally caution you against interpreting these
p-values as
> > indicating presence or absence of the transcript.
> >
> > As to your question, technical and biological variability are
completely
> > confounded here, so you have to set up your experiments in such a
way
> that
> > the contribution from technical variability is minimized. For
instance,
> if
> > you do all controls one day and diseased the next, you cannot
possibly
> tell
> > if any differences were due to biology or to technical
differences.
> However,
> > if you randomize sample types over days processed, then the
technical
> > variability (which still exists, and is confounded with biological
> > variability), will tend to appear as noise, and be captured by the
> residual
> > term.
> >
> > Also, in my opinion there isn't any difference between the two
situations
> > (assuming I understand situation B correctly). What I think you
are
> asking
> > is this; are there any substantive differences between a situation
where
> a
> > gene is apparently unexpressed in sample A but expressed to a
certain
> degree
> > in sample B and a situation where a gene is expressed in both
samples,
> but
> > at a two fold (or greater) level in B vs A.
> >
> > In my opinion, there is no difference between those scenarios. In
each
> > situation, the gene is expressed at a much lower level in one
sample
> versus
> > the other. The relative levels are unimportant, as the absolute
accuracy
> of
> > our measuring device is not good.
> >
> > Best,
> >
> > Jim
> >
> >
> >
> >> My question is what's the correct interpretation in terms of
whether the
> >> differential expression is due to technical or biological? Are
they both
> >> DE
> >> due to technical, or A is DE due to biological and B is due to
> technical,
> >> or
> >> they are both DE due to biological?
> >>
> >> Thanks a bunch,
> >>
> >> -Jack
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor@stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > Douglas Lab
> > University of Michigan
> > Department of Human Genetics
> > 5912 Buhl
> > 1241 E. Catherine St.
> > Ann Arbor MI 48109-5618
> > 734-615-7826
> > **********************************************************
> > Electronic Mail is not secure, may not be read every day, and
should not
> be
> > used for urgent or sensitive issues
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
Hi,
in a couple of months we will begin data collection on a major
epigenomics project using the Illumina Infinium 450K platform. In the
past I have used beadarray and methylumi to import the methylation
data into R and perform some basic QC and analyses. My main concern
now is the switch from the 27K to the 450K chip. When will the new
chip be fully supported by revised packages (roughly)?
Thanks, Ina
----- Original Message -----
From: "Sean Davis" <sdavis2@mail.nih.gov>
To: "Jack Luo" <jluo.rhelp at="" gmail.com="">
Cc: "James W. MacDonald" <jmacdon at="" med.umich.edu="">, bioconductor at
stat.math.ethz.ch
Sent: Monday, September 27, 2010 3:06:56 PM
Subject: Re: [BioC] question regarding differential expression
On Mon, Sep 27, 2010 at 2:47 PM, Jack Luo <jluo.rhelp at="" gmail.com="">
wrote:
> Jim,
>
> Thanks for your detailed explanation on this, it's really helpful. I
agree
> with you that the term "present/absent" might be problematic,
perhaps a
> more
> accurate term is reliable/unreliable. I am not sure I agree that the
> technical and biological variability are completely confounded, it's
well
> randomized experiment with disease/healthy status, not something
like all
> disease in one day/batch/..., all healthy in another day/batch/ ....
The
> last two paragraphs of your email answered my question very
accurately
> (that's exactly what I am asking). Sorry to bother you with another
> question: do you think the difference is technical or biological? In
our
> data, we have the same set of samples (say, 100 healthy vs. 100
disease)
> run
> using two different batches (batch difference could be due to lots
of
> things
> like reagent, hybwash...), comparing the differential expression
from one
> batch to another, I found many genes that are differentially
expressed in
> the 1st batch that are like gene B: higher present% call in one
group than
> the other group. However, in the 2nd batch, I found lots of them
lose the
> present% difference between the two groups and also goes from
> differentially
> expressed to non-differentially expressed (I found this for both RMA
and
> MAS5), which makes me wonder the differential expression in the 1st
batch
> is
> due to technical reasons, not biological reasons (since the biology
of the
> two batches are identical because they are from the same 200
samples).
>
>
Hi, Jun.
It is not unexpected for a differentially-expressed gene to show up
having a
different number of P/A calls in one group than the other. In the
ideal
case, a gene is highly expressed in one group and not expressed at all
in
the other. Jim made the point that P/A calls only roughly measure
actual
presence or absence, so take them with a grain of salt. Also, just
because
a gene has such a difference in P/A calls between groups does imply
that a
gene is differentially expressed or that it is not. The hypothesis
test
that uses the measured signal is what is usually considered when
looking at
differential expression.
Sean
> On Fri, Sep 24, 2010 at 9:40 AM, James W. MacDonald
> <jmacdon at="" med.umich.edu="">wrote:
>
> > Hi Jack,
> >
> >
> > On 9/23/2010 4:45 PM, Jack Luo wrote:
> >
> >> Hi,
> >>
> >> This is a conceptual question related to microarray, instead of
the
> usage
> >> of
> >> any Bioconductor package. I apologize if this bothers anyone.
> >>
> >> I am struggling to understand the concept of differential
expression in
> >> terms of its resources (whether it is technical or biological).
Suppose
> I
> >> have an experiment with two groups (healthy vs. disease) and try
to find
> >> some differentially expressed genes, take two genes for example,
both of
> >> them are differentially expressed (DE) between healthy and
disease.
> >>
> >> Gene A has present detection call for all the samples under study
(but
> the
> >> detection call p-value in the healthy group is in the order of
1e-2 ~
> >> 1e-3,
> >> the detection call p-value in the disease group is much more
significant
> >> (say, 1e-10)).
> >> Gene B has 50% present call in healthy while 100% present call in
> cancer.
> >>
> >
> > First let's backtrack and talk about P/M/A calls, and what they
mean. The
> > statistics underlying these calls are testing whether or not the
PM
> probes
> > in aggregate appear to be different than the corresponding MM
probes in a
> > given probeset. Others will disagree, but I think it is incorrect
to
> assume
> > that an absent call means that the transcript being measured is
absent.
> What
> > it really means is that we cannot say that the PM probes are
binding more
> > transcript than the MM probes.
> >
> > If you make the assumption that the MM probes do a good job of
measuring
> > background, then the absent call really means it is absent.
However, a
> large
> > percentage of MM probes have higher fluorescence readings than the
> > corresponding PM probe (it varies by chip, but is usually > 30%.
You can
> > check with your data to verify). In addition, the MM probe
intensity will
> > increase with increasing amounts of transcript. These are two of
the
> reasons
> > that Affy has abandoned the use of MM probes (more real estate on
the
> chip
> > being a third), and why very few people use MAS5 for computing
expression
> > values any more.
> >
> > So I would personally caution you against interpreting these
p-values as
> > indicating presence or absence of the transcript.
> >
> > As to your question, technical and biological variability are
completely
> > confounded here, so you have to set up your experiments in such a
way
> that
> > the contribution from technical variability is minimized. For
instance,
> if
> > you do all controls one day and diseased the next, you cannot
possibly
> tell
> > if any differences were due to biology or to technical
differences.
> However,
> > if you randomize sample types over days processed, then the
technical
> > variability (which still exists, and is confounded with biological
> > variability), will tend to appear as noise, and be captured by the
> residual
> > term.
> >
> > Also, in my opinion there isn't any difference between the two
situations
> > (assuming I understand situation B correctly). What I think you
are
> asking
> > is this; are there any substantive differences between a situation
where
> a
> > gene is apparently unexpressed in sample A but expressed to a
certain
> degree
> > in sample B and a situation where a gene is expressed in both
samples,
> but
> > at a two fold (or greater) level in B vs A.
> >
> > In my opinion, there is no difference between those scenarios. In
each
> > situation, the gene is expressed at a much lower level in one
sample
> versus
> > the other. The relative levels are unimportant, as the absolute
accuracy
> of
> > our measuring device is not good.
> >
> > Best,
> >
> > Jim
> >
> >
> >
> >> My question is what's the correct interpretation in terms of
whether the
> >> differential expression is due to technical or biological? Are
they both
> >> DE
> >> due to technical, or A is DE due to biological and B is due to
> technical,
> >> or
> >> they are both DE due to biological?
> >>
> >> Thanks a bunch,
> >>
> >> -Jack
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > Douglas Lab
> > University of Michigan
> > Department of Human Genetics
> > 5912 Buhl
> > 1241 E. Catherine St.
> > Ann Arbor MI 48109-5618
> > 734-615-7826
> > **********************************************************
> > Electronic Mail is not secure, may not be read every day, and
should not
> be
> > used for urgent or sensitive issues
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor