question regarding differential expression

0

Entering edit mode

Jack Luo ▴ 450

@jack-luo-4241

Last seen 11.3 years ago

Hi, This is a conceptual question related to microarray, instead of the usage of any Bioconductor package. I apologize if this bothers anyone. I am struggling to understand the concept of differential expression in terms of its resources (whether it is technical or biological). Suppose I have an experiment with two groups (healthy vs. disease) and try to find some differentially expressed genes, take two genes for example, both of them are differentially expressed (DE) between healthy and disease. Gene A has present detection call for all the samples under study (but the detection call p-value in the healthy group is in the order of 1e-2 ~ 1e-3, the detection call p-value in the disease group is much more significant (say, 1e-10)). Gene B has 50% present call in healthy while 100% present call in cancer. My question is what's the correct interpretation in terms of whether the differential expression is due to technical or biological? Are they both DE due to technical, or A is DE due to biological and B is due to technical, or they are both DE due to biological? Thanks a bunch, -Jack [[alternative HTML version deleted]]

Microarray Microarray • 2.3k views

ADD COMMENT • link updated 15.3 years ago by James W. MacDonald 68k • written 15.3 years ago by Jack Luo ▴ 450

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 6 hours ago

United States

On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com> wrote: > Hi, > > This is a conceptual question related to microarray, instead of the usage > of > any Bioconductor package. I apologize if this bothers anyone. > > I am struggling to understand the concept of differential expression in > terms of its resources (whether it is technical or biological). Suppose I > have an experiment with two groups (healthy vs. disease) and try to find > some differentially expressed genes, take two genes for example, both of > them are differentially expressed (DE) between healthy and disease. > > Gene A has present detection call for all the samples under study (but the > detection call p-value in the healthy group is in the order of 1e-2 ~ 1e-3, > the detection call p-value in the disease group is much more significant > (say, 1e-10)). > Gene B has 50% present call in healthy while 100% present call in cancer. > > This sounds like a good candidate. The gene appears to be differentially expressed. > My question is what's the correct interpretation in terms of whether the > differential expression is due to technical or biological? Are they both DE > due to technical, or A is DE due to biological and B is due to technical, > or > they are both DE due to biological? > > This is not a question that can be answered with certainty. However, if you used biological replicates, the usual interpretation is that the DE could be due to true biological differences (though we cannot prove that without further experiments). Sean [[alternative HTML version deleted]]

ADD COMMENT • link 15.3 years ago Sean Davis 21k

0

Entering edit mode

Sean, Thanks for your email. You mean Gene B sounds like a good candidate? Could you be more specific on the usage of biological replicates? I don't seem to get the connection to the question I asked. Thanks, -Jack On Thu, Sep 23, 2010 at 5:26 PM, Sean Davis <sdavis2@mail.nih.gov> wrote: > > > On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com> wrote: > >> Hi, >> >> This is a conceptual question related to microarray, instead of the usage >> of >> any Bioconductor package. I apologize if this bothers anyone. >> >> I am struggling to understand the concept of differential expression in >> terms of its resources (whether it is technical or biological). Suppose I >> have an experiment with two groups (healthy vs. disease) and try to find >> some differentially expressed genes, take two genes for example, both of >> them are differentially expressed (DE) between healthy and disease. >> >> Gene A has present detection call for all the samples under study (but the >> detection call p-value in the healthy group is in the order of 1e-2 ~ >> 1e-3, >> the detection call p-value in the disease group is much more significant >> (say, 1e-10)). >> Gene B has 50% present call in healthy while 100% present call in cancer. >> >> > This sounds like a good candidate. The gene appears to be differentially > expressed. > > >> My question is what's the correct interpretation in terms of whether the >> differential expression is due to technical or biological? Are they both >> DE >> due to technical, or A is DE due to biological and B is due to technical, >> or >> they are both DE due to biological? >> >> > This is not a question that can be answered with certainty. However, if > you used biological replicates, the usual interpretation is that the DE > could be due to true biological differences (though we cannot prove that > without further experiments). > > Sean > > > [[alternative HTML version deleted]]

ADD REPLY • link 15.3 years ago Jack Luo ▴ 450

0

Entering edit mode

Sunny Srivastava ▴ 350

@sunny-srivastava-3793

Last seen 11.3 years ago

Hello Jack, I am not sure what you mean by: "....p-value in the disease group is much more significant (say, 1e-10))." If you mean to use the *magnitude* of p-values as an indicator for the *strength* of evidence against (or for) the null, then it it can lead to trouble ... Theoretically speaking, it is incorrect to compare two p-values and make conclusions about the strength of evidence against (or for) the null (in your case *more* differential expression of Gene A in Disease compared to Control ). Thanks and Best Regards, S. On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com> wrote: > Hi, > > This is a conceptual question related to microarray, instead of the usage > of > any Bioconductor package. I apologize if this bothers anyone. > > I am struggling to understand the concept of differential expression in > terms of its resources (whether it is technical or biological). Suppose I > have an experiment with two groups (healthy vs. disease) and try to find > some differentially expressed genes, take two genes for example, both of > them are differentially expressed (DE) between healthy and disease. > > Gene A has present detection call for all the samples under study (but the > detection call p-value in the healthy group is in the order of 1e-2 ~ 1e-3, > the detection call p-value in the disease group is much more significant > (say, 1e-10)). > Gene B has 50% present call in healthy while 100% present call in cancer. > > My question is what's the correct interpretation in terms of whether the > differential expression is due to technical or biological? Are they both DE > due to technical, or A is DE due to biological and B is due to technical, > or > they are both DE due to biological? > > Thanks a bunch, > > -Jack > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 15.3 years ago Sunny Srivastava ▴ 350

0

Entering edit mode

Sunny, What I meant by "p-value in the disease group is much more significant" is that the detection call p-value in the disease group are more significant than healthy group, which means the signal in the disease group is more significantly above background noise than healthy group. I am not sure I understand what you mean by "the null", you mean the background (or the mismatch probes)? When you say "strength of evidence", do you mean the strength of the signal? Could you be more specific on what "trouble" means? Sorry that I get lost here, -Jack On Thu, Sep 23, 2010 at 5:43 PM, Sunny Srivastava <research.baba@gmail.com>wrote: > Hello Jack, > I am not sure what you mean by: > > "....p-value in the disease group is much more significant > (say, 1e-10))." > > If you mean to use the *magnitude* of p-values as an indicator for the > *strength* of evidence against (or for) the null, then it it can lead to > trouble ... Theoretically speaking, it is incorrect to compare two p-values > and make conclusions about the strength of evidence against (or for) the > null (in your case *more* differential expression of Gene A in Disease > compared to Control ). > > Thanks and Best Regards, > S. > > On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com> wrote: > >> Hi, >> >> This is a conceptual question related to microarray, instead of the usage >> of >> any Bioconductor package. I apologize if this bothers anyone. >> >> I am struggling to understand the concept of differential expression in >> terms of its resources (whether it is technical or biological). Suppose I >> have an experiment with two groups (healthy vs. disease) and try to find >> some differentially expressed genes, take two genes for example, both of >> them are differentially expressed (DE) between healthy and disease. >> >> Gene A has present detection call for all the samples under study (but the >> detection call p-value in the healthy group is in the order of 1e-2 ~ >> 1e-3, >> the detection call p-value in the disease group is much more significant >> (say, 1e-10)). >> Gene B has 50% present call in healthy while 100% present call in cancer. >> >> My question is what's the correct interpretation in terms of whether the >> differential expression is due to technical or biological? Are they both >> DE >> due to technical, or A is DE due to biological and B is due to technical, >> or >> they are both DE due to biological? >> >> Thanks a bunch, >> >> -Jack >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 15.3 years ago Jack Luo ▴ 450

0

Entering edit mode

Hello Jack, I am sorry for writing a statistical statement without knowing your background. When I say "null", I mean the null hypothesis (it is just a statement that you reject using p-values) for the expression of a gene (A or B in your case). A null hypothesis is a status quo statement about the expression of any gene. In your case this is equivalent to saying that the gene (A or B) is not differentially expressed when you compare a person with disease to a healthy person. And you reject the null (hypothesis) if p-value is < 0.01. But most importantly, magnitude of p-values should not be used as an indicator for the strength of signal or indicator for the strength of differential expression of a gene in diseased compared to a healthy person. This can lead to wrong conclusions about a gene A being "more" differentially expressed than gene B (as you say: gene A is more significant than gene B if pvalue for A < pvalue for B when comparing diseased and healthy individuals), which may not be true. This is what I called 'trouble'. sorry for the statistical jargons. Thanks, S. On Fri, Sep 24, 2010 at 8:46 AM, Jack Luo <jluo.rhelp@gmail.com> wrote: > Sunny, > > What I meant by "p-value in the disease group is much more significant" is > that the detection call p-value in the disease group are more significant > than healthy group, which means the signal in the disease group is more > significantly above background noise than healthy group. > > I am not sure I understand what you mean by "the null", you mean the > background (or the mismatch probes)? When you say "strength of evidence", do > you mean the strength of the signal? Could you be more specific on what > "trouble" means? > > Sorry that I get lost here, > > -Jack > > > On Thu, Sep 23, 2010 at 5:43 PM, Sunny Srivastava <research.baba@gmail.com> > wrote: > >> Hello Jack, >> I am not sure what you mean by: >> >> "....p-value in the disease group is much more significant >> (say, 1e-10))." >> >> If you mean to use the *magnitude* of p-values as an indicator for the >> *strength* of evidence against (or for) the null, then it it can lead to >> trouble ... Theoretically speaking, it is incorrect to compare two p-values >> and make conclusions about the strength of evidence against (or for) the >> null (in your case *more* differential expression of Gene A in Disease >> compared to Control ). >> >> Thanks and Best Regards, >> S. >> >> On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com> wrote: >> >>> Hi, >>> >>> This is a conceptual question related to microarray, instead of the usage >>> of >>> any Bioconductor package. I apologize if this bothers anyone. >>> >>> I am struggling to understand the concept of differential expression in >>> terms of its resources (whether it is technical or biological). Suppose I >>> have an experiment with two groups (healthy vs. disease) and try to find >>> some differentially expressed genes, take two genes for example, both of >>> them are differentially expressed (DE) between healthy and disease. >>> >>> Gene A has present detection call for all the samples under study (but >>> the >>> detection call p-value in the healthy group is in the order of 1e-2 ~ >>> 1e-3, >>> the detection call p-value in the disease group is much more significant >>> (say, 1e-10)). >>> Gene B has 50% present call in healthy while 100% present call in cancer. >>> >>> My question is what's the correct interpretation in terms of whether the >>> differential expression is due to technical or biological? Are they both >>> DE >>> due to technical, or A is DE due to biological and B is due to technical, >>> or >>> they are both DE due to biological? >>> >>> Thanks a bunch, >>> >>> -Jack >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > [[alternative HTML version deleted]]

ADD REPLY • link 15.3 years ago Sunny Srivastava ▴ 350

0

Entering edit mode

Hi Sunny, Thanks for your email and clarification. I am familiar with the term "null hypothesis" from statistical perspective (although I am not trained as a statistician), what I meant to clarify is exactly what is the null. I agree with the caution of using p-value as a measure of the degree of differential expression, but that's kinda off the topic. What I really meant to ask is that for the two genes A and B, are they differentially expressed due to true biological reason or technical reason. Anyway, thanks for your effort on this, much appreciated. -Jack On Fri, Sep 24, 2010 at 2:22 PM, Sunny Srivastava <research.baba@gmail.com>wrote: > Hello Jack, > I am sorry for writing a statistical statement without knowing your > background. > > When I say "null", I mean the null hypothesis (it is just a statement that > you reject using p-values) for the expression of a gene (A or B in your > case). A null hypothesis is a status quo statement about the expression of > any gene. In your case this is equivalent to saying that the gene (A or B) > is not differentially expressed when you compare a person with disease to a > healthy person. And you reject the null (hypothesis) if p-value is < 0.01. > > But most importantly, magnitude of p-values should not be used as an > indicator for the strength of signal or indicator for the strength of > differential expression of a gene in diseased compared to a healthy person. > This can lead to wrong conclusions about a gene A being "more" > differentially expressed than gene B (as you say: gene A is more significant > than gene B if pvalue for A < pvalue for B when comparing diseased and > healthy individuals), which may not be true. This is what I called > 'trouble'. > > sorry for the statistical jargons. > > Thanks, > S. > > On Fri, Sep 24, 2010 at 8:46 AM, Jack Luo <jluo.rhelp@gmail.com> wrote: > >> Sunny, >> >> What I meant by "p-value in the disease group is much more significant" is >> that the detection call p-value in the disease group are more significant >> than healthy group, which means the signal in the disease group is more >> significantly above background noise than healthy group. >> >> I am not sure I understand what you mean by "the null", you mean the >> background (or the mismatch probes)? When you say "strength of evidence", do >> you mean the strength of the signal? Could you be more specific on what >> "trouble" means? >> >> Sorry that I get lost here, >> >> -Jack >> >> >> On Thu, Sep 23, 2010 at 5:43 PM, Sunny Srivastava < >> research.baba@gmail.com> wrote: >> >>> Hello Jack, >>> I am not sure what you mean by: >>> >>> "....p-value in the disease group is much more significant >>> (say, 1e-10))." >>> >>> If you mean to use the *magnitude* of p-values as an indicator for the >>> *strength* of evidence against (or for) the null, then it it can lead to >>> trouble ... Theoretically speaking, it is incorrect to compare two p-values >>> and make conclusions about the strength of evidence against (or for) the >>> null (in your case *more* differential expression of Gene A in Disease >>> compared to Control ). >>> >>> Thanks and Best Regards, >>> S. >>> >>> On Thu, Sep 23, 2010 at 4:45 PM, Jack Luo <jluo.rhelp@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> This is a conceptual question related to microarray, instead of the >>>> usage of >>>> any Bioconductor package. I apologize if this bothers anyone. >>>> >>>> I am struggling to understand the concept of differential expression in >>>> terms of its resources (whether it is technical or biological). Suppose >>>> I >>>> have an experiment with two groups (healthy vs. disease) and try to find >>>> some differentially expressed genes, take two genes for example, both of >>>> them are differentially expressed (DE) between healthy and disease. >>>> >>>> Gene A has present detection call for all the samples under study (but >>>> the >>>> detection call p-value in the healthy group is in the order of 1e-2 ~ >>>> 1e-3, >>>> the detection call p-value in the disease group is much more significant >>>> (say, 1e-10)). >>>> Gene B has 50% present call in healthy while 100% present call in >>>> cancer. >>>> >>>> My question is what's the correct interpretation in terms of whether the >>>> differential expression is due to technical or biological? Are they both >>>> DE >>>> due to technical, or A is DE due to biological and B is due to >>>> technical, or >>>> they are both DE due to biological? >>>> >>>> Thanks a bunch, >>>> >>>> -Jack >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >> > [[alternative HTML version deleted]]

ADD REPLY • link 15.3 years ago Jack Luo ▴ 450

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 4 hours ago

United States

Hi Jack, On 9/23/2010 4:45 PM, Jack Luo wrote: > Hi, > > This is a conceptual question related to microarray, instead of the usage of > any Bioconductor package. I apologize if this bothers anyone. > > I am struggling to understand the concept of differential expression in > terms of its resources (whether it is technical or biological). Suppose I > have an experiment with two groups (healthy vs. disease) and try to find > some differentially expressed genes, take two genes for example, both of > them are differentially expressed (DE) between healthy and disease. > > Gene A has present detection call for all the samples under study (but the > detection call p-value in the healthy group is in the order of 1e-2 ~ 1e-3, > the detection call p-value in the disease group is much more significant > (say, 1e-10)). > Gene B has 50% present call in healthy while 100% present call in cancer. First let's backtrack and talk about P/M/A calls, and what they mean. The statistics underlying these calls are testing whether or not the PM probes in aggregate appear to be different than the corresponding MM probes in a given probeset. Others will disagree, but I think it is incorrect to assume that an absent call means that the transcript being measured is absent. What it really means is that we cannot say that the PM probes are binding more transcript than the MM probes. If you make the assumption that the MM probes do a good job of measuring background, then the absent call really means it is absent. However, a large percentage of MM probes have higher fluorescence readings than the corresponding PM probe (it varies by chip, but is usually > 30%. You can check with your data to verify). In addition, the MM probe intensity will increase with increasing amounts of transcript. These are two of the reasons that Affy has abandoned the use of MM probes (more real estate on the chip being a third), and why very few people use MAS5 for computing expression values any more. So I would personally caution you against interpreting these p-values as indicating presence or absence of the transcript. As to your question, technical and biological variability are completely confounded here, so you have to set up your experiments in such a way that the contribution from technical variability is minimized. For instance, if you do all controls one day and diseased the next, you cannot possibly tell if any differences were due to biology or to technical differences. However, if you randomize sample types over days processed, then the technical variability (which still exists, and is confounded with biological variability), will tend to appear as noise, and be captured by the residual term. Also, in my opinion there isn't any difference between the two situations (assuming I understand situation B correctly). What I think you are asking is this; are there any substantive differences between a situation where a gene is apparently unexpressed in sample A but expressed to a certain degree in sample B and a situation where a gene is expressed in both samples, but at a two fold (or greater) level in B vs A. In my opinion, there is no difference between those scenarios. In each situation, the gene is expressed at a much lower level in one sample versus the other. The relative levels are unimportant, as the absolute accuracy of our measuring device is not good. Best, Jim > > My question is what's the correct interpretation in terms of whether the > differential expression is due to technical or biological? Are they both DE > due to technical, or A is DE due to biological and B is due to technical, or > they are both DE due to biological? > > Thanks a bunch, > > -Jack > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 15.3 years ago James W. MacDonald 68k

0

Entering edit mode

Jim, Thanks for your detailed explanation on this, it's really helpful. I agree with you that the term "present/absent" might be problematic, perhaps a more accurate term is reliable/unreliable. I am not sure I agree that the technical and biological variability are completely confounded, it's well randomized experiment with disease/healthy status, not something like all disease in one day/batch/..., all healthy in another day/batch/ .... The last two paragraphs of your email answered my question very accurately (that's exactly what I am asking). Sorry to bother you with another question: do you think the difference is technical or biological? In our data, we have the same set of samples (say, 100 healthy vs. 100 disease) run using two different batches (batch difference could be due to lots of things like reagent, hybwash...), comparing the differential expression from one batch to another, I found many genes that are differentially expressed in the 1st batch that are like gene B: higher present% call in one group than the other group. However, in the 2nd batch, I found lots of them lose the present% difference between the two groups and also goes from differentially expressed to non-differentially expressed (I found this for both RMA and MAS5), which makes me wonder the differential expression in the 1st batch is due to technical reasons, not biological reasons (since the biology of the two batches are identical because they are from the same 200 samples). Thanks again, -Jun On Fri, Sep 24, 2010 at 9:40 AM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > Hi Jack, > > > On 9/23/2010 4:45 PM, Jack Luo wrote: > >> Hi, >> >> This is a conceptual question related to microarray, instead of the usage >> of >> any Bioconductor package. I apologize if this bothers anyone. >> >> I am struggling to understand the concept of differential expression in >> terms of its resources (whether it is technical or biological). Suppose I >> have an experiment with two groups (healthy vs. disease) and try to find >> some differentially expressed genes, take two genes for example, both of >> them are differentially expressed (DE) between healthy and disease. >> >> Gene A has present detection call for all the samples under study (but the >> detection call p-value in the healthy group is in the order of 1e-2 ~ >> 1e-3, >> the detection call p-value in the disease group is much more significant >> (say, 1e-10)). >> Gene B has 50% present call in healthy while 100% present call in cancer. >> > > First let's backtrack and talk about P/M/A calls, and what they mean. The > statistics underlying these calls are testing whether or not the PM probes > in aggregate appear to be different than the corresponding MM probes in a > given probeset. Others will disagree, but I think it is incorrect to assume > that an absent call means that the transcript being measured is absent. What > it really means is that we cannot say that the PM probes are binding more > transcript than the MM probes. > > If you make the assumption that the MM probes do a good job of measuring > background, then the absent call really means it is absent. However, a large > percentage of MM probes have higher fluorescence readings than the > corresponding PM probe (it varies by chip, but is usually > 30%. You can > check with your data to verify). In addition, the MM probe intensity will > increase with increasing amounts of transcript. These are two of the reasons > that Affy has abandoned the use of MM probes (more real estate on the chip > being a third), and why very few people use MAS5 for computing expression > values any more. > > So I would personally caution you against interpreting these p-values as > indicating presence or absence of the transcript. > > As to your question, technical and biological variability are completely > confounded here, so you have to set up your experiments in such a way that > the contribution from technical variability is minimized. For instance, if > you do all controls one day and diseased the next, you cannot possibly tell > if any differences were due to biology or to technical differences. However, > if you randomize sample types over days processed, then the technical > variability (which still exists, and is confounded with biological > variability), will tend to appear as noise, and be captured by the residual > term. > > Also, in my opinion there isn't any difference between the two situations > (assuming I understand situation B correctly). What I think you are asking > is this; are there any substantive differences between a situation where a > gene is apparently unexpressed in sample A but expressed to a certain degree > in sample B and a situation where a gene is expressed in both samples, but > at a two fold (or greater) level in B vs A. > > In my opinion, there is no difference between those scenarios. In each > situation, the gene is expressed at a much lower level in one sample versus > the other. The relative levels are unimportant, as the absolute accuracy of > our measuring device is not good. > > Best, > > Jim > > > >> My question is what's the correct interpretation in terms of whether the >> differential expression is due to technical or biological? Are they both >> DE >> due to technical, or A is DE due to biological and B is due to technical, >> or >> they are both DE due to biological? >> >> Thanks a bunch, >> >> -Jack >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > [[alternative HTML version deleted]]

ADD REPLY • link 15.3 years ago Jack Luo ▴ 450

0

Entering edit mode

On Mon, Sep 27, 2010 at 2:47 PM, Jack Luo <jluo.rhelp@gmail.com> wrote: > Jim, > > Thanks for your detailed explanation on this, it's really helpful. I agree > with you that the term "present/absent" might be problematic, perhaps a > more > accurate term is reliable/unreliable. I am not sure I agree that the > technical and biological variability are completely confounded, it's well > randomized experiment with disease/healthy status, not something like all > disease in one day/batch/..., all healthy in another day/batch/ .... The > last two paragraphs of your email answered my question very accurately > (that's exactly what I am asking). Sorry to bother you with another > question: do you think the difference is technical or biological? In our > data, we have the same set of samples (say, 100 healthy vs. 100 disease) > run > using two different batches (batch difference could be due to lots of > things > like reagent, hybwash...), comparing the differential expression from one > batch to another, I found many genes that are differentially expressed in > the 1st batch that are like gene B: higher present% call in one group than > the other group. However, in the 2nd batch, I found lots of them lose the > present% difference between the two groups and also goes from > differentially > expressed to non-differentially expressed (I found this for both RMA and > MAS5), which makes me wonder the differential expression in the 1st batch > is > due to technical reasons, not biological reasons (since the biology of the > two batches are identical because they are from the same 200 samples). > > Hi, Jun. It is not unexpected for a differentially-expressed gene to show up having a different number of P/A calls in one group than the other. In the ideal case, a gene is highly expressed in one group and not expressed at all in the other. Jim made the point that P/A calls only roughly measure actual presence or absence, so take them with a grain of salt. Also, just because a gene has such a difference in P/A calls between groups does imply that a gene is differentially expressed or that it is not. The hypothesis test that uses the measured signal is what is usually considered when looking at differential expression. Sean > On Fri, Sep 24, 2010 at 9:40 AM, James W. MacDonald > <jmacdon@med.umich.edu>wrote: > > > Hi Jack, > > > > > > On 9/23/2010 4:45 PM, Jack Luo wrote: > > > >> Hi, > >> > >> This is a conceptual question related to microarray, instead of the > usage > >> of > >> any Bioconductor package. I apologize if this bothers anyone. > >> > >> I am struggling to understand the concept of differential expression in > >> terms of its resources (whether it is technical or biological). Suppose > I > >> have an experiment with two groups (healthy vs. disease) and try to find > >> some differentially expressed genes, take two genes for example, both of > >> them are differentially expressed (DE) between healthy and disease. > >> > >> Gene A has present detection call for all the samples under study (but > the > >> detection call p-value in the healthy group is in the order of 1e-2 ~ > >> 1e-3, > >> the detection call p-value in the disease group is much more significant > >> (say, 1e-10)). > >> Gene B has 50% present call in healthy while 100% present call in > cancer. > >> > > > > First let's backtrack and talk about P/M/A calls, and what they mean. The > > statistics underlying these calls are testing whether or not the PM > probes > > in aggregate appear to be different than the corresponding MM probes in a > > given probeset. Others will disagree, but I think it is incorrect to > assume > > that an absent call means that the transcript being measured is absent. > What > > it really means is that we cannot say that the PM probes are binding more > > transcript than the MM probes. > > > > If you make the assumption that the MM probes do a good job of measuring > > background, then the absent call really means it is absent. However, a > large > > percentage of MM probes have higher fluorescence readings than the > > corresponding PM probe (it varies by chip, but is usually > 30%. You can > > check with your data to verify). In addition, the MM probe intensity will > > increase with increasing amounts of transcript. These are two of the > reasons > > that Affy has abandoned the use of MM probes (more real estate on the > chip > > being a third), and why very few people use MAS5 for computing expression > > values any more. > > > > So I would personally caution you against interpreting these p-values as > > indicating presence or absence of the transcript. > > > > As to your question, technical and biological variability are completely > > confounded here, so you have to set up your experiments in such a way > that > > the contribution from technical variability is minimized. For instance, > if > > you do all controls one day and diseased the next, you cannot possibly > tell > > if any differences were due to biology or to technical differences. > However, > > if you randomize sample types over days processed, then the technical > > variability (which still exists, and is confounded with biological > > variability), will tend to appear as noise, and be captured by the > residual > > term. > > > > Also, in my opinion there isn't any difference between the two situations > > (assuming I understand situation B correctly). What I think you are > asking > > is this; are there any substantive differences between a situation where > a > > gene is apparently unexpressed in sample A but expressed to a certain > degree > > in sample B and a situation where a gene is expressed in both samples, > but > > at a two fold (or greater) level in B vs A. > > > > In my opinion, there is no difference between those scenarios. In each > > situation, the gene is expressed at a much lower level in one sample > versus > > the other. The relative levels are unimportant, as the absolute accuracy > of > > our measuring device is not good. > > > > Best, > > > > Jim > > > > > > > >> My question is what's the correct interpretation in terms of whether the > >> differential expression is due to technical or biological? Are they both > >> DE > >> due to technical, or A is DE due to biological and B is due to > technical, > >> or > >> they are both DE due to biological? > >> > >> Thanks a bunch, > >> > >> -Jack > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > Douglas Lab > > University of Michigan > > Department of Human Genetics > > 5912 Buhl > > 1241 E. Catherine St. > > Ann Arbor MI 48109-5618 > > 734-615-7826 > > ********************************************************** > > Electronic Mail is not secure, may not be read every day, and should not > be > > used for urgent or sensitive issues > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 15.3 years ago Sean Davis 21k

0

Entering edit mode

Hi, in a couple of months we will begin data collection on a major epigenomics project using the Illumina Infinium 450K platform. In the past I have used beadarray and methylumi to import the methylation data into R and perform some basic QC and analyses. My main concern now is the switch from the 27K to the 450K chip. When will the new chip be fully supported by revised packages (roughly)? Thanks, Ina ----- Original Message ----- From: "Sean Davis" <sdavis2@mail.nih.gov> To: "Jack Luo" <jluo.rhelp at="" gmail.com=""> Cc: "James W. MacDonald" <jmacdon at="" med.umich.edu="">, bioconductor at stat.math.ethz.ch Sent: Monday, September 27, 2010 3:06:56 PM Subject: Re: [BioC] question regarding differential expression On Mon, Sep 27, 2010 at 2:47 PM, Jack Luo <jluo.rhelp at="" gmail.com=""> wrote: > Jim, > > Thanks for your detailed explanation on this, it's really helpful. I agree > with you that the term "present/absent" might be problematic, perhaps a > more > accurate term is reliable/unreliable. I am not sure I agree that the > technical and biological variability are completely confounded, it's well > randomized experiment with disease/healthy status, not something like all > disease in one day/batch/..., all healthy in another day/batch/ .... The > last two paragraphs of your email answered my question very accurately > (that's exactly what I am asking). Sorry to bother you with another > question: do you think the difference is technical or biological? In our > data, we have the same set of samples (say, 100 healthy vs. 100 disease) > run > using two different batches (batch difference could be due to lots of > things > like reagent, hybwash...), comparing the differential expression from one > batch to another, I found many genes that are differentially expressed in > the 1st batch that are like gene B: higher present% call in one group than > the other group. However, in the 2nd batch, I found lots of them lose the > present% difference between the two groups and also goes from > differentially > expressed to non-differentially expressed (I found this for both RMA and > MAS5), which makes me wonder the differential expression in the 1st batch > is > due to technical reasons, not biological reasons (since the biology of the > two batches are identical because they are from the same 200 samples). > > Hi, Jun. It is not unexpected for a differentially-expressed gene to show up having a different number of P/A calls in one group than the other. In the ideal case, a gene is highly expressed in one group and not expressed at all in the other. Jim made the point that P/A calls only roughly measure actual presence or absence, so take them with a grain of salt. Also, just because a gene has such a difference in P/A calls between groups does imply that a gene is differentially expressed or that it is not. The hypothesis test that uses the measured signal is what is usually considered when looking at differential expression. Sean > On Fri, Sep 24, 2010 at 9:40 AM, James W. MacDonald > <jmacdon at="" med.umich.edu="">wrote: > > > Hi Jack, > > > > > > On 9/23/2010 4:45 PM, Jack Luo wrote: > > > >> Hi, > >> > >> This is a conceptual question related to microarray, instead of the > usage > >> of > >> any Bioconductor package. I apologize if this bothers anyone. > >> > >> I am struggling to understand the concept of differential expression in > >> terms of its resources (whether it is technical or biological). Suppose > I > >> have an experiment with two groups (healthy vs. disease) and try to find > >> some differentially expressed genes, take two genes for example, both of > >> them are differentially expressed (DE) between healthy and disease. > >> > >> Gene A has present detection call for all the samples under study (but > the > >> detection call p-value in the healthy group is in the order of 1e-2 ~ > >> 1e-3, > >> the detection call p-value in the disease group is much more significant > >> (say, 1e-10)). > >> Gene B has 50% present call in healthy while 100% present call in > cancer. > >> > > > > First let's backtrack and talk about P/M/A calls, and what they mean. The > > statistics underlying these calls are testing whether or not the PM > probes > > in aggregate appear to be different than the corresponding MM probes in a > > given probeset. Others will disagree, but I think it is incorrect to > assume > > that an absent call means that the transcript being measured is absent. > What > > it really means is that we cannot say that the PM probes are binding more > > transcript than the MM probes. > > > > If you make the assumption that the MM probes do a good job of measuring > > background, then the absent call really means it is absent. However, a > large > > percentage of MM probes have higher fluorescence readings than the > > corresponding PM probe (it varies by chip, but is usually > 30%. You can > > check with your data to verify). In addition, the MM probe intensity will > > increase with increasing amounts of transcript. These are two of the > reasons > > that Affy has abandoned the use of MM probes (more real estate on the > chip > > being a third), and why very few people use MAS5 for computing expression > > values any more. > > > > So I would personally caution you against interpreting these p-values as > > indicating presence or absence of the transcript. > > > > As to your question, technical and biological variability are completely > > confounded here, so you have to set up your experiments in such a way > that > > the contribution from technical variability is minimized. For instance, > if > > you do all controls one day and diseased the next, you cannot possibly > tell > > if any differences were due to biology or to technical differences. > However, > > if you randomize sample types over days processed, then the technical > > variability (which still exists, and is confounded with biological > > variability), will tend to appear as noise, and be captured by the > residual > > term. > > > > Also, in my opinion there isn't any difference between the two situations > > (assuming I understand situation B correctly). What I think you are > asking > > is this; are there any substantive differences between a situation where > a > > gene is apparently unexpressed in sample A but expressed to a certain > degree > > in sample B and a situation where a gene is expressed in both samples, > but > > at a two fold (or greater) level in B vs A. > > > > In my opinion, there is no difference between those scenarios. In each > > situation, the gene is expressed at a much lower level in one sample > versus > > the other. The relative levels are unimportant, as the absolute accuracy > of > > our measuring device is not good. > > > > Best, > > > > Jim > > > > > > > >> My question is what's the correct interpretation in terms of whether the > >> differential expression is due to technical or biological? Are they both > >> DE > >> due to technical, or A is DE due to biological and B is due to > technical, > >> or > >> they are both DE due to biological? > >> > >> Thanks a bunch, > >> > >> -Jack > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > -- > > James W. MacDonald, M.S. > > Biostatistician > > Douglas Lab > > University of Michigan > > Department of Human Genetics > > 5912 Buhl > > 1241 E. Catherine St. > > Ann Arbor MI 48109-5618 > > 734-615-7826 > > ********************************************************** > > Electronic Mail is not secure, may not be read every day, and should not > be > > used for urgent or sensitive issues > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 15.3 years ago Ina Hoeschele ▴ 620

Login before adding your answer.