pm summarization method

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 3 months ago

Germany

Hi everybody, I have a question about the behavior of the expresso command when extracting the raw data from an affyBatch. I wanted to evaluate the raw intensities values of a specific gene from my data set and tried to extract it like that: rawdata <- expresso(totalExpressionData, bg.correct=FALSE, normalize=FALSE, pmcorrect.method="pmonly", summary.method="avgdiff", verbose=TRUE) I've got the result I wanted: wt1 wt2 wt3 treat1 treat2 treat3 gene_at 125.5 101 123.5 52.5 63.5 58 The problem was that i expected them to be the other way around. So I decided to look into the specific probe values of the probes for this probe-set. This are the values I've got from the PM and MM respectively: wt1 wt2 wt3 treat1 treat2 treat3 probe1 403 379 220 420 530 316 probe2 117 84 104 52 57 54 probe3 49 49 73 38 58 52 probe4 87 67 110 55 43 49 probe5 66 61 51 46 72 62 probe6 118 100 104 69 87 74 probe7 180 142 170 45 46 45 probe8 133 102 137 95 132 81 probe9 80 71 65 52 54 46 probe10 63 45 56 53 53 54 probe11 293 321 260 444 618 408 probe12 171 167 169 49 75 72 probe13 198 197 307 40 67 50 probe14 247 265 348 53 60 62 probe1 533 519 294 507 739 404 probe2 1789 1271 1468 1430 1666 1552 probe3 56 66 59 51 45 48 probe4 49 52 64 47 45 47 probe5 54 47 33 49 65 55 probe6 84 72 90 53 92 71 probe7 73 72 65 40 53 54 probe8 83 108 115 81 111 94 probe9 49 56 43 52 41 53 probe10 56 46 62 68 77 57 probe11 54 83 55 47 64 46 probe12 106 98 76 52 66 53 probe13 43 48 37 36 52 39 probe14 94 92 99 43 43 49 When I calculate the average of these two tables for each array I don't get the same values as presented in the top table. I would like to understand how from the values on the last two tables I come to a summarized value I get. Even if I ignore the MM values completely, which I think it does, I still don't see how it comes to these values. The two probes (Nr. 1 and 11 of the PM values) are strongly differ from the rest of the probes for this probe-set. Are they being ignored in the summarization? Thanks in advance Assa [[alternative HTML version deleted]]

probe probe • 905 views

ADD COMMENT • link 12.0 years ago Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 6 hours ago

United States

Hi Assa, On 4/16/2012 9:05 AM, Assa Yeroslaviz wrote: > Hi everybody, > > I have a question about the behavior of the expresso command when > extracting the raw data from an affyBatch. > > I wanted to evaluate the raw intensities values of a specific gene from my > data set and tried to extract it like that: > rawdata<- expresso(totalExpressionData, bg.correct=FALSE, > normalize=FALSE, > pmcorrect.method="pmonly", summary.method="avgdiff", > verbose=TRUE) > > I've got the result I wanted: > wt1 wt2 wt3 treat1 treat2 treat3 > gene_at 125.5 101 123.5 52.5 63.5 58 > > The problem was that i expected them to be the other way around. Why did you expect it to be the other way around? What exactly did you expect this call to expresso() to do? Since affy is primarily based on S4 methods, it can be a bit difficult to figure out what a given function is going to do, so I can understand you not knowing what this call to expresso() is going to end up doing. However, what you are doing is pretty weird, no? The avgdiff method implies pm-mm, so what do you expect to happen if you then specify pmonly? Given that the pmcorrect.method controls how we correct the PM probes, and there is a subtractmm option, one would normally assume that the 'difference' part of avgdiff might happen in that step. But you said not to compute that, so all you are left with is the 'avg' part of avgdiff. But let's set the logical assumptions aside and look at the actual code. Going through the code to expresso() is a bit like following Alice down the rabbit hole, so I will cut to the chase. In the end, you will be calling two pieces of code that will handle the pm adjustment and the summary statistic calculation. In your call to expresso() these will be (respectively): > pmcorrect.pmonly function (object) { return(pm(object)) } and > generateExprVal.method.avgdiff function (probes, ...) { list(exprs = apply(probes, 2, median), se.exprs = apply(probes, 2, sd)/sqrt(nrow(probes))) } So the pmonly will just give you the pm probes, and then avgdiff will give you the column medians of the pm data. Therefore you have basically told expresso() that you want the median value for the non-background corrected, unnormalized pm probes. Was that your intention? Best, Jim > So I decided to look into the specific probe values of the probes for this > probe-set. > This are the values I've got from the PM and MM respectively: > wt1 wt2 wt3 treat1 treat2 treat3 > probe1 403 379 220 420 530 316 > probe2 117 84 104 52 57 54 > probe3 49 49 73 38 58 52 > probe4 87 67 110 55 43 49 > probe5 66 61 51 46 72 62 > probe6 118 100 104 69 87 74 > probe7 180 142 170 45 46 45 > probe8 133 102 137 95 132 81 > probe9 80 71 65 52 54 46 > probe10 63 45 56 53 53 54 > probe11 293 321 260 444 618 408 > probe12 171 167 169 49 75 72 > probe13 198 197 307 40 67 50 > probe14 247 265 348 53 60 62 > > probe1 533 519 294 507 739 404 > probe2 1789 1271 1468 1430 1666 1552 > probe3 56 66 59 51 45 48 > probe4 49 52 64 47 45 47 > probe5 54 47 33 49 65 55 > probe6 84 72 90 53 92 71 > probe7 73 72 65 40 53 54 > probe8 83 108 115 81 111 94 > probe9 49 56 43 52 41 53 > probe10 56 46 62 68 77 57 > probe11 54 83 55 47 64 46 > probe12 106 98 76 52 66 53 > probe13 43 48 37 36 52 39 > probe14 94 92 99 43 43 49 > > When I calculate the average of these two tables for each array I don't get > the same values as presented in the top table. > I would like to understand how from the values on the last two tables I > come to a summarized value I get. Even if I ignore the MM values > completely, which I think it does, I still don't see how it comes to these > values. The two probes (Nr. 1 and 11 of the PM values) are strongly differ > from the rest of the probes for this probe-set. Are they being ignored in > the summarization? > > Thanks in advance > > Assa > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives:http://news.gmane.org/gmane.science.biology.info rmatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 12.0 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Jim and bioC-Users, On Mon, Apr 16, 2012 at 17:54, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Assa, > > > On 4/16/2012 9:05 AM, Assa Yeroslaviz wrote: > >> Hi everybody, >> >> I have a question about the behavior of the expresso command when >> extracting the raw data from an affyBatch. >> >> I wanted to evaluate the raw intensities values of a specific gene from my >> data set and tried to extract it like that: >> rawdata<- expresso(totalExpressionData, bg.correct=FALSE, >> normalize=FALSE, >> pmcorrect.method="pmonly", >> summary.method="avgdiff", >> verbose=TRUE) >> >> I've got the result I wanted: >> wt1 wt2 wt3 treat1 treat2 treat3 >> gene_at 125.5 101 123.5 52.5 63.5 58 >> >> The problem was that i expected them to be the other way around. >> > > Why did you expect it to be the other way around? What exactly did you > expect this call to expresso() to do? > > Sorry I didn't explain a bit more. The gene I was investigating is the 'white' gene in Drosophila, which sets the eye color of the flies. It was used in our experiment to test the flies if they have a certain knockout/mutant/treatment. By 'expect it to be the other way around' I meant that only the treated flies should'veh ad a high level of the white gene. In this data it is the other way around. We are not sure, if the error is in the data or maybe it was a error made in the lab who analyzed the arrays and made an erroneous identification symbols for the arrays. That was a little background information about the experiment. I hope it clarify the problem better. > Since affy is primarily based on S4 methods, it can be a bit difficult to > figure out what a given function is going to do, so I can understand you > not knowing what this call to expresso() is going to end up doing. However, > what you are doing is pretty weird, no? The avgdiff method implies pm-mm, > so what do you expect to happen if you then specify pmonly? > > Given that the pmcorrect.method controls how we correct the PM probes, and > there is a subtractmm option, one would normally assume that the > 'difference' part of avgdiff might happen in that step. But you said not to > compute that, so all you are left with is the 'avg' part of avgdiff. > Now for the point of my command. To see if I did wrongly classify the experiments or it was given to me as such, I wanted to extract the raw data from the arrays. As I said before, I know one can extract the PM and MM values for each of the probes of a probe-set. What I really would like to have is *one value for each probe-set and array each*. For that reason I used the summarization of the values with the expresso command. I will try to answer your questions as I understand them. I used the pmonly correction method, because if I understood it the right way, this is what the RMA algorithm is using when running the normalization procedure on the data. Am I wrong about that? Which way is the best way to extract the 'real' raw values from the arrays? I know that asking such question get you never a straightforward answer, but a near one will be good :-) If my logical thinking is wrong, will this cmmand will present me with a more accurate results? rawdatasubtractmm <- expresso(totalExpressionData, bg.correct=FALSE, normalize=FALSE, pmcorrect.method="subtractmm", summary.method="avgdiff", verbose=TRUE) I will run it now and look for the differences. > > But let's set the logical assumptions aside and look at the actual code. > Going through the code to expresso() is a bit like following Alice down the > rabbit hole, so I will cut to the chase. In the end, you will be calling > two pieces of code that will handle the pm adjustment and the summary > statistic calculation. In your call to expresso() these will be > (respectively): > > > pmcorrect.pmonly > function (object) > { > return(pm(object)) > } > > and > > > generateExprVal.method.avgdiff > function (probes, ...) > { > list(exprs = apply(probes, 2, median), se.exprs = apply(probes, > 2, sd)/sqrt(nrow(probes))) > } > > So the pmonly will just give you the pm probes, and then avgdiff will give > you the column medians of the pm data. Therefore you have basically told > expresso() that you want the median value for the non-background corrected, > unnormalized pm probes. > > In the vignette from affy it said to give back the average of the values, so I expected it to give me back the mean, not the median. May be this is why I didn't really understood the results. Now it make more sense. > Was that your intention? > As I said earlier, I wanted to extract the 'real' values for the probe-sets and I thought that extracting the PM values with no correction or normalization will be the best method to achieve it. I ran now several other possibilities: pm.correction summary.method gene wt1 wt2 wt3 treat1 treat2 treat3 mas mas gene_at 52.01982773 34.16503645 46.87094917 4.081420059 9.505471139 3.538333392 pmonly avgdiff gene_at 125.5 101 123.5 52.5 63.5 58 pmonly medianpolish gene_at 6.862247387 6.564551864 6.926547933 6.093298307 6.355810999 6.105348101 subtractmm avgdiff gene_at 36 15 22 2 8 2.5 This are the results. The mas and the subtractmm gives me the differences between PM and MM. Though the results are quite different, both of them gives me the same behavior. the expression in higher in the wild-type. The medianpolish gives me the log2 values of the PM values. This results I don't understand, a I can clearly see a difference between the wt and the treatemnt, but the log-scale values show almost no difference at all. I hope it make some more sense and I would be happy to hear some more suggestions/comments on that topic thanks a lot, Assa > > Best, > > Jim > > > > So I decided to look into the specific probe values of the probes for this >> probe-set. >> This are the values I've got from the PM and MM respectively: >> wt1 wt2 wt3 treat1 treat2 treat3 >> probe1 403 379 220 420 530 316 >> probe2 117 84 104 52 57 54 >> probe3 49 49 73 38 58 52 >> probe4 87 67 110 55 43 49 >> probe5 66 61 51 46 72 62 >> probe6 118 100 104 69 87 74 >> probe7 180 142 170 45 46 45 >> probe8 133 102 137 95 132 81 >> probe9 80 71 65 52 54 46 >> probe10 63 45 56 53 53 54 >> probe11 293 321 260 444 618 408 >> probe12 171 167 169 49 75 72 >> probe13 198 197 307 40 67 50 >> probe14 247 265 348 53 60 62 >> >> probe1 533 519 294 507 739 404 >> probe2 1789 1271 1468 1430 1666 1552 >> probe3 56 66 59 51 45 48 >> probe4 49 52 64 47 45 47 >> probe5 54 47 33 49 65 55 >> probe6 84 72 90 53 92 71 >> probe7 73 72 65 40 53 54 >> probe8 83 108 115 81 111 94 >> probe9 49 56 43 52 41 53 >> probe10 56 46 62 68 77 57 >> probe11 54 83 55 47 64 46 >> probe12 106 98 76 52 66 53 >> probe13 43 48 37 36 52 39 >> probe14 94 92 99 43 43 49 >> >> When I calculate the average of these two tables for each array I don't >> get >> the same values as presented in the top table. >> I would like to understand how from the values on the last two tables I >> come to a summarized value I get. Even if I ignore the MM values >> completely, which I think it does, I still don't see how it comes to these >> values. The two probes (Nr. 1 and 11 of the PM values) are strongly differ >> from the rest of the probes for this probe-set. Are they being ignored in >> the summarization? >> >> Thanks in advance >> >> Assa >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives:http://news.gmane.**org/gmane.science.biology.** >> informatics.conductor<http: news.gmane.org="" gmane.science.biology.i="" nformatics.conductor=""> >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]

ADD REPLY • link 12.0 years ago Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

Hi Assa, On 4/17/2012 3:09 AM, Assa Yeroslaviz wrote: > Hi Jim and bioC-Users, > > > On Mon, Apr 16, 2012 at 17:54, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Assa, > > > On 4/16/2012 9:05 AM, Assa Yeroslaviz wrote: > > Hi everybody, > > I have a question about the behavior of the expresso command when > extracting the raw data from an affyBatch. > > I wanted to evaluate the raw intensities values of a specific > gene from my > data set and tried to extract it like that: > rawdata<- expresso(totalExpressionData, bg.correct=FALSE, > normalize=FALSE, > pmcorrect.method="pmonly", > summary.method="avgdiff", > verbose=TRUE) > > I've got the result I wanted: > wt1 wt2 wt3 treat1 treat2 treat3 > gene_at 125.5 101 123.5 52.5 63.5 58 > > The problem was that i expected them to be the other way around. > > > Why did you expect it to be the other way around? What exactly did > you expect this call to expresso() to do? > > > Sorry I didn't explain a bit more. The gene I was investigating is the > 'white' gene in Drosophila, which sets the eye color of the flies. It > was used in our experiment to test the flies if they have a certain > knockout/mutant/treatment. By 'expect it to be the other way around' I > meant that only the treated flies should'veh ad a high level of the > white gene. In this data it is the other way around. > We are not sure, if the error is in the data or maybe it was a error > made in the lab who analyzed the arrays and made an erroneous > identification symbols for the arrays. > > That was a little background information about the experiment. I hope > it clarify the problem better. > > Since affy is primarily based on S4 methods, it can be a bit > difficult to figure out what a given function is going to do, so I > can understand you not knowing what this call to expresso() is > going to end up doing. However, what you are doing is pretty > weird, no? The avgdiff method implies pm-mm, so what do you expect > to happen if you then specify pmonly? > > Given that the pmcorrect.method controls how we correct the PM > probes, and there is a subtractmm option, one would normally > assume that the 'difference' part of avgdiff might happen in that > step. But you said not to compute that, so all you are left with > is the 'avg' part of avgdiff. > > > Now for the point of my command. To see if I did wrongly classify the > experiments or it was given to me as such, I wanted to extract the raw > data from the arrays. As I said before, I know one can extract the PM > and MM values for each of the probes of a probe-set. What I really > would like to have is _one value for each probe-set and array each_. > > For that reason I used the summarization of the values with the > expresso command. I will try to answer your questions as I understand > them. > I used the pmonly correction method, because if I understood it the > right way, this is what the RMA algorithm is using when running the > normalization procedure on the data. Am I wrong about that? > > Which way is the best way to extract the 'real' raw values from the > arrays? > I know that asking such question get you never a straightforward > answer, but a near one will be good :-) Why do you need a raw value from the array to see if you mixed samples? To put the question a different way, if normalized, background corrected expression values are good enough for doing statistics, why are they not good enough to determine if you mixed samples up? > > If my logical thinking is wrong, will this cmmand will present me with > a more accurate results? > rawdatasubtractmm <- expresso(totalExpressionData, bg.correct=FALSE, > normalize=FALSE, > pmcorrect.method="subtractmm", > summary.method="avgdiff", > verbose=TRUE) > > I will run it now and look for the differences. > > > But let's set the logical assumptions aside and look at the actual > code. Going through the code to expresso() is a bit like following > Alice down the rabbit hole, so I will cut to the chase. In the > end, you will be calling two pieces of code that will handle the > pm adjustment and the summary statistic calculation. In your call > to expresso() these will be (respectively): > > > pmcorrect.pmonly > function (object) > { > return(pm(object)) > } > > and > > > generateExprVal.method.avgdiff > function (probes, ...) > { > list(exprs = apply(probes, 2, median), se.exprs = apply(probes, > 2, sd)/sqrt(nrow(probes))) > } > > So the pmonly will just give you the pm probes, and then avgdiff > will give you the column medians of the pm data. Therefore you > have basically told expresso() that you want the median value for > the non-background corrected, unnormalized pm probes. > > > In the vignette from affy it said to give back the average of the > values, so I expected it to give me back the mean, not the median. May > be this is why I didn't really understood the results. Now it make > more sense. > > Was that your intention? > > > As I said earlier, I wanted to extract the 'real' values for the > probe-sets and I thought that extracting the PM values with no > correction or normalization will be the best method to achieve it. > I ran now several other possibilities: > pm.correction summary.method gene wt1 wt2 wt3 > treat1 treat2 treat3 > mas mas gene_at 52.01982773 34.16503645 46.87094917 > 4.081420059 9.505471139 3.538333392 > pmonly avgdiff gene_at 125.5 101 123.5 52.5 > 63.5 58 > pmonly medianpolish gene_at 6.862247387 6.564551864 > 6.926547933 6.093298307 6.355810999 6.105348101 > subtractmm avgdiff gene_at 36 15 22 2 8 2.5 > > This are the results. > The mas and the subtractmm gives me the differences between PM and MM. > Though the results are quite different, both of them gives me the same > behavior. the expression in higher in the wild-type. > The medianpolish gives me the log2 values of the PM values. This > results I don't understand, a I can clearly see a difference between > the wt and the treatemnt, but the log-scale values show almost no > difference at all. Here is where our interpretations of the raw data diverge. I only see a few probes in your raw data that have very different expression levels between the sample types. The only probes that look to be binding at any appreciable level are the first two, and there really isn't any difference between them. In addition, if you believe that MM probes measure background, it doesn't really look like you have much if any of this transcript being expressed in either sample. Best, Jim > > I hope it make some more sense and I would be happy to hear some more > suggestions/comments on that topic > > thanks a lot, > Assa > > > Best, > > Jim > > > > So I decided to look into the specific probe values of the > probes for this > probe-set. > This are the values I've got from the PM and MM respectively: > wt1 wt2 wt3 treat1 treat2 treat3 > probe1 403 379 220 420 530 316 > probe2 117 84 104 52 57 54 > probe3 49 49 73 38 58 52 > probe4 87 67 110 55 43 49 > probe5 66 61 51 46 72 62 > probe6 118 100 104 69 87 74 > probe7 180 142 170 45 46 45 > probe8 133 102 137 95 132 81 > probe9 80 71 65 52 54 46 > probe10 63 45 56 53 53 54 > probe11 293 321 260 444 618 408 > probe12 171 167 169 49 75 72 > probe13 198 197 307 40 67 50 > probe14 247 265 348 53 60 62 > > probe1 533 519 294 507 739 404 > probe2 1789 1271 1468 1430 1666 1552 > probe3 56 66 59 51 45 48 > probe4 49 52 64 47 45 47 > probe5 54 47 33 49 65 55 > probe6 84 72 90 53 92 71 > probe7 73 72 65 40 53 54 > probe8 83 108 115 81 111 94 > probe9 49 56 43 52 41 53 > probe10 56 46 62 68 77 57 > probe11 54 83 55 47 64 46 > probe12 106 98 76 52 66 53 > probe13 43 48 37 36 52 39 > probe14 94 92 99 43 43 49 > > When I calculate the average of these two tables for each > array I don't get > the same values as presented in the top table. > I would like to understand how from the values on the last two > tables I > come to a summarized value I get. Even if I ignore the MM values > completely, which I think it does, I still don't see how it > comes to these > values. The two probes (Nr. 1 and 11 of the PM values) are > strongly differ > from the rest of the probes for this probe-set. Are they being > ignored in > the summarization? > > Thanks in advance > > Assa > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the > archives:http://news.gmane.org/gmane.science.biology.informa tics.conductor > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 12.0 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Jim, On Tue, Apr 17, 2012 at 20:04, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Assa, > > > On 4/17/2012 3:09 AM, Assa Yeroslaviz wrote: > >> Hi Jim and bioC-Users, >> >> >> >> On Mon, Apr 16, 2012 at 17:54, James W. MacDonald <jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote: >> >> Hi Assa, >> >> >> On 4/16/2012 9:05 AM, Assa Yeroslaviz wrote: >> >> Hi everybody, >> >> I have a question about the behavior of the expresso command when >> extracting the raw data from an affyBatch. >> >> I wanted to evaluate the raw intensities values of a specific >> gene from my >> data set and tried to extract it like that: >> rawdata<- expresso(totalExpressionData, bg.correct=FALSE, >> normalize=FALSE, >> pmcorrect.method="pmonly", >> summary.method="avgdiff", >> verbose=TRUE) >> >> I've got the result I wanted: >> wt1 wt2 wt3 treat1 treat2 treat3 >> gene_at 125.5 101 123.5 52.5 63.5 58 >> >> The problem was that i expected them to be the other way around. >> >> >> Why did you expect it to be the other way around? What exactly did >> you expect this call to expresso() to do? >> >> >> Sorry I didn't explain a bit more. The gene I was investigating is the >> 'white' gene in Drosophila, which sets the eye color of the flies. It was >> used in our experiment to test the flies if they have a certain >> knockout/mutant/treatment. By 'expect it to be the other way around' I >> meant that only the treated flies should'veh ad a high level of the white >> gene. In this data it is the other way around. >> We are not sure, if the error is in the data or maybe it was a error made >> in the lab who analyzed the arrays and made an erroneous identification >> symbols for the arrays. >> >> That was a little background information about the experiment. I hope it >> clarify the problem better. >> >> Since affy is primarily based on S4 methods, it can be a bit >> difficult to figure out what a given function is going to do, so I >> can understand you not knowing what this call to expresso() is >> going to end up doing. However, what you are doing is pretty >> weird, no? The avgdiff method implies pm-mm, so what do you expect >> to happen if you then specify pmonly? >> >> Given that the pmcorrect.method controls how we correct the PM >> probes, and there is a subtractmm option, one would normally >> assume that the 'difference' part of avgdiff might happen in that >> step. But you said not to compute that, so all you are left with >> is the 'avg' part of avgdiff. >> >> >> Now for the point of my command. To see if I did wrongly classify the >> experiments or it was given to me as such, I wanted to extract the raw data >> from the arrays. As I said before, I know one can extract the PM and MM >> values for each of the probes of a probe-set. What I really would like to >> have is _one value for each probe-set and array each_. >> >> >> For that reason I used the summarization of the values with the expresso >> command. I will try to answer your questions as I understand them. >> I used the pmonly correction method, because if I understood it the right >> way, this is what the RMA algorithm is using when running the normalization >> procedure on the data. Am I wrong about that? >> >> Which way is the best way to extract the 'real' raw values from the >> arrays? >> I know that asking such question get you never a straightforward answer, >> but a near one will be good :-) >> > > Why do you need a raw value from the array to see if you mixed samples? To > put the question a different way, if normalized, background corrected > expression values are good enough for doing statistics, why are they not > good enough to determine if you mixed samples up? > > I basically wanted to show the raw values to the biologists in the lab. The problem was, that they suggested that I somehow might have switched the way of analyzing the data. I mean they thought, that instead of doing Treatment- Control I did Control-Treatment, which would have easily explained the results of the higher expression of this gene in the control. By showing them the raw data, I can give them a proof, that the switch wasn't made by me, but the arrays were switched, or that the gene is somehow truly down-regulated in the treatment. > >> If my logical thinking is wrong, will this cmmand will present me with a >> more accurate results? >> rawdatasubtractmm <- expresso(totalExpressionData, bg.correct=FALSE, >> normalize=FALSE, >> pmcorrect.method="subtractmm", >> summary.method="avgdiff", >> verbose=TRUE) >> >> I will run it now and look for the differences. >> >> >> But let's set the logical assumptions aside and look at the actual >> code. Going through the code to expresso() is a bit like following >> Alice down the rabbit hole, so I will cut to the chase. In the >> end, you will be calling two pieces of code that will handle the >> pm adjustment and the summary statistic calculation. In your call >> to expresso() these will be (respectively): >> >> > pmcorrect.pmonly >> function (object) >> { >> return(pm(object)) >> } >> >> and >> >> > generateExprVal.method.avgdiff >> function (probes, ...) >> { >> list(exprs = apply(probes, 2, median), se.exprs = apply(probes, >> 2, sd)/sqrt(nrow(probes))) >> } >> >> So the pmonly will just give you the pm probes, and then avgdiff >> will give you the column medians of the pm data. Therefore you >> have basically told expresso() that you want the median value for >> the non-background corrected, unnormalized pm probes. >> >> >> In the vignette from affy it said to give back the average of the values, >> so I expected it to give me back the mean, not the median. May be this is >> why I didn't really understood the results. Now it make more sense. >> >> Was that your intention? >> >> >> As I said earlier, I wanted to extract the 'real' values for the >> probe-sets and I thought that extracting the PM values with no correction >> or normalization will be the best method to achieve it. >> I ran now several other possibilities: >> pm.correction summary.method gene wt1 wt2 wt3 treat1 >> treat2 treat3 >> mas mas gene_at 52.01982773 34.16503645 46.87094917 >> 4.081420059 9.505471139 3.538333392 >> pmonly avgdiff gene_at 125.5 101 123.5 52.5 63.5 >> 58 >> pmonly medianpolish gene_at 6.862247387 6.564551864 >> 6.926547933 6.093298307 6.355810999 6.105348101 >> subtractmm avgdiff gene_at 36 15 22 2 8 2.5 >> >> This are the results. >> The mas and the subtractmm gives me the differences between PM and MM. >> Though the results are quite different, both of them gives me the same >> behavior. the expression in higher in the wild-type. >> The medianpolish gives me the log2 values of the PM values. This results >> I don't understand, a I can clearly see a difference between the wt and the >> treatemnt, but the log-scale values show almost no difference at all. >> > > Here is where our interpretations of the raw data diverge. I only see a > few probes in your raw data that have very different expression levels > between the sample types. The only probes that look to be binding at any > appreciable level are the first two, and there really isn't any difference > between them. In addition, if you believe that MM probes measure > background, it doesn't really look like you have much if any of this > transcript being expressed in either sample. > Sorry,but this here I don't understand. I didn't present any probes from my data, but just the results of *one gene*after running the expresso command with *different options*. How can you say that: "I only see a few probes in your raw data that have very different expression levels between the sample types"? I mean after analyzing the data , I think you are also right, it just wonder me how could you have known that. I was just wondering here, why the medianpolish option shows basically no differences at all, while the others three with the mas or subtractmm options shows such a huge difference. Can it be explained? Thanks, Assa > Best, > > Jim > > > >> I hope it make some more sense and I would be happy to hear some more >> suggestions/comments on that topic >> >> thanks a lot, >> Assa >> >> >> Best, >> >> Jim >> >> >> >> So I decided to look into the specific probe values of the >> probes for this >> probe-set. >> This are the values I've got from the PM and MM respectively: >> wt1 wt2 wt3 treat1 treat2 treat3 >> probe1 403 379 220 420 530 316 >> probe2 117 84 104 52 57 54 >> probe3 49 49 73 38 58 52 >> probe4 87 67 110 55 43 49 >> probe5 66 61 51 46 72 62 >> probe6 118 100 104 69 87 74 >> probe7 180 142 170 45 46 45 >> probe8 133 102 137 95 132 81 >> probe9 80 71 65 52 54 46 >> probe10 63 45 56 53 53 54 >> probe11 293 321 260 444 618 408 >> probe12 171 167 169 49 75 72 >> probe13 198 197 307 40 67 50 >> probe14 247 265 348 53 60 62 >> >> probe1 533 519 294 507 739 404 >> probe2 1789 1271 1468 1430 1666 1552 >> probe3 56 66 59 51 45 48 >> probe4 49 52 64 47 45 47 >> probe5 54 47 33 49 65 55 >> probe6 84 72 90 53 92 71 >> probe7 73 72 65 40 53 54 >> probe8 83 108 115 81 111 94 >> probe9 49 56 43 52 41 53 >> probe10 56 46 62 68 77 57 >> probe11 54 83 55 47 64 46 >> probe12 106 98 76 52 66 53 >> probe13 43 48 37 36 52 39 >> probe14 94 92 99 43 43 49 >> >> When I calculate the average of these two tables for each >> array I don't get >> the same values as presented in the top table. >> I would like to understand how from the values on the last two >> tables I >> come to a summarized value I get. Even if I ignore the MM values >> completely, which I think it does, I still don't see how it >> comes to these >> values. The two probes (Nr. 1 and 11 of the PM values) are >> strongly differ >> from the rest of the probes for this probe-set. Are they being >> ignored in >> the summarization? >> >> Thanks in advance >> >> Assa >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the >> archives:http://news.gmane.**org/gmane.science.biology.** >> informatics.conductor<http: news.gmane.org="" gmane.science.biology.i="" nformatics.conductor=""> >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]

ADD REPLY • link 12.0 years ago Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

Hi Assa, On 4/18/2012 2:33 AM, Assa Yeroslaviz wrote: > Hi Jim, > > On Tue, Apr 17, 2012 at 20:04, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Assa, > > > On 4/17/2012 3:09 AM, Assa Yeroslaviz wrote: > > Hi Jim and bioC-Users, > > > > On Mon, Apr 16, 2012 at 17:54, James W. MacDonald > <jmacdon at="" uw.edu="" <mailto:jmacdon="" at="" uw.edu=""> <mailto:jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">>> wrote: > > Hi Assa, > > > On 4/16/2012 9:05 AM, Assa Yeroslaviz wrote: > > Hi everybody, > > I have a question about the behavior of the expresso > command when > extracting the raw data from an affyBatch. > > I wanted to evaluate the raw intensities values of a > specific > gene from my > data set and tried to extract it like that: > rawdata<- expresso(totalExpressionData, bg.correct=FALSE, > normalize=FALSE, > pmcorrect.method="pmonly", > summary.method="avgdiff", > verbose=TRUE) > > I've got the result I wanted: > wt1 wt2 wt3 treat1 treat2 > treat3 > gene_at 125.5 101 123.5 52.5 63.5 58 > > The problem was that i expected them to be the other > way around. > > > Why did you expect it to be the other way around? What > exactly did > you expect this call to expresso() to do? > > > Sorry I didn't explain a bit more. The gene I was > investigating is the 'white' gene in Drosophila, which sets > the eye color of the flies. It was used in our experiment to > test the flies if they have a certain > knockout/mutant/treatment. By 'expect it to be the other way > around' I meant that only the treated flies should'veh ad a > high level of the white gene. In this data it is the other way > around. > We are not sure, if the error is in the data or maybe it was a > error made in the lab who analyzed the arrays and made an > erroneous identification symbols for the arrays. > > That was a little background information about the experiment. > I hope it clarify the problem better. > > Since affy is primarily based on S4 methods, it can be a bit > difficult to figure out what a given function is going to > do, so I > can understand you not knowing what this call to expresso() is > going to end up doing. However, what you are doing is pretty > weird, no? The avgdiff method implies pm-mm, so what do you > expect > to happen if you then specify pmonly? > > Given that the pmcorrect.method controls how we correct the PM > probes, and there is a subtractmm option, one would normally > assume that the 'difference' part of avgdiff might happen > in that > step. But you said not to compute that, so all you are left > with > is the 'avg' part of avgdiff. > > > Now for the point of my command. To see if I did wrongly > classify the experiments or it was given to me as such, I > wanted to extract the raw data from the arrays. As I said > before, I know one can extract the PM and MM values for each > of the probes of a probe-set. What I really would like to have > is _one value for each probe-set and array each_. > > > For that reason I used the summarization of the values with > the expresso command. I will try to answer your questions as I > understand them. > I used the pmonly correction method, because if I understood > it the right way, this is what the RMA algorithm is using when > running the normalization procedure on the data. Am I wrong > about that? > > Which way is the best way to extract the 'real' raw values > from the arrays? > I know that asking such question get you never a > straightforward answer, but a near one will be good :-) > > > Why do you need a raw value from the array to see if you mixed > samples? To put the question a different way, if normalized, > background corrected expression values are good enough for doing > statistics, why are they not good enough to determine if you mixed > samples up? > > > I basically wanted to show the raw values to the biologists in the > lab. The problem was, that they suggested that I somehow might have > switched the way of analyzing the data. I mean they thought, that > instead of doing Treatment- Control I did Control-Treatment, which > would have easily explained the results of the higher expression of > this gene in the control. > > By showing them the raw data, I can give them a proof, that the switch > wasn't made by me, but the arrays were switched, or that the gene is > somehow truly down-regulated in the treatment. > > > > > If my logical thinking is wrong, will this cmmand will present > me with a more accurate results? > rawdatasubtractmm <- expresso(totalExpressionData, > bg.correct=FALSE, > normalize=FALSE, > pmcorrect.method="subtractmm", > summary.method="avgdiff", > verbose=TRUE) > > I will run it now and look for the differences. > > > But let's set the logical assumptions aside and look at the > actual > code. Going through the code to expresso() is a bit like > following > Alice down the rabbit hole, so I will cut to the chase. In the > end, you will be calling two pieces of code that will > handle the > pm adjustment and the summary statistic calculation. In > your call > to expresso() these will be (respectively): > > > pmcorrect.pmonly > function (object) > { > return(pm(object)) > } > > and > > > generateExprVal.method.avgdiff > function (probes, ...) > { > list(exprs = apply(probes, 2, median), se.exprs = > apply(probes, > 2, sd)/sqrt(nrow(probes))) > } > > So the pmonly will just give you the pm probes, and then > avgdiff > will give you the column medians of the pm data. Therefore you > have basically told expresso() that you want the median > value for > the non-background corrected, unnormalized pm probes. > > > In the vignette from affy it said to give back the average of > the values, so I expected it to give me back the mean, not the > median. May be this is why I didn't really understood the > results. Now it make more sense. > > Was that your intention? > > > As I said earlier, I wanted to extract the 'real' values for > the probe-sets and I thought that extracting the PM values > with no correction or normalization will be the best method to > achieve it. > I ran now several other possibilities: > pm.correction summary.method gene wt1 wt2 wt3 > treat1 treat2 treat3 > mas mas gene_at 52.01982773 34.16503645 > 46.87094917 4.081420059 9.505471139 3.538333392 > pmonly avgdiff gene_at 125.5 101 123.5 52.5 > 63.5 58 > pmonly medianpolish gene_at 6.862247387 > 6.564551864 6.926547933 6.093298307 6.355810999 > 6.105348101 > subtractmm avgdiff gene_at 36 15 22 2 8 2.5 > <tel:36%20%20%20%2015%20%20%20%2022%20%20%20%202%20%20%20%20 8%20%20%20%202.5=""> > > This are the results. > The mas and the subtractmm gives me the differences between PM > and MM. Though the results are quite different, both of them > gives me the same behavior. the expression in higher in the > wild-type. > The medianpolish gives me the log2 values of the PM values. > This results I don't understand, a I can clearly see a > difference between the wt and the treatemnt, but the log- scale > values show almost no difference at all. > > > Here is where our interpretations of the raw data diverge. I only > see a few probes in your raw data that have very different > expression levels between the sample types. The only probes that > look to be binding at any appreciable level are the first two, and > there really isn't any difference between them. In addition, if > you believe that MM probes measure background, it doesn't really > look like you have much if any of this transcript being expressed > in either sample. > > > Sorry,but this here I don't understand. > I didn't present any probes from my data, but just the results of _one > gene_ after running the expresso command with _different options_. > How can you say that: "I only see a few probes in your raw data that > have very different expression levels between the sample types"? > I mean after analyzing the data , I think you are also right, it just > wonder me how could you have known that. What are the data you show just below? You said they were the PM and MM probe data for the gene in question. > > I was just wondering here, why the medianpolish option shows basically > no differences at all, while the others three with the mas or > subtractmm options shows such a huge difference. > Can it be explained? First, these aren't huge differences. Sure, if you do fold changes they are, but remember that these data range from 0 to 2^14 or so (hypothetically 2^16, but it is rare to get much over 2^14). A difference of 40 at the very low end of that range is indistinguishable from noise. Second, both mas5 and avgdiff suffer from the same problems; they assume MM probes accurately measure just background (not true), and they ignore probe-specific binding, which adds nuisance data to the result. Once you account for probe-specific binding, the differences largely disappear. Best, Jim > > Thanks, > Assa > > > > Best, > > Jim > > > > I hope it make some more sense and I would be happy to hear > some more suggestions/comments on that topic > > thanks a lot, > Assa > > > Best, > > Jim > > > > So I decided to look into the specific probe values of the > probes for this > probe-set. > This are the values I've got from the PM and MM > respectively: > wt1 wt2 wt3 treat1 treat2 treat3 > probe1 403 379 220 420 530 316 > probe2 117 84 104 52 57 54 > probe3 49 49 73 38 58 52 > probe4 87 67 110 55 43 49 > probe5 66 61 51 46 72 62 > probe6 118 100 104 69 87 74 > probe7 180 142 170 45 46 45 > probe8 133 102 137 95 132 81 > probe9 80 71 65 52 54 46 > probe10 63 45 56 53 53 54 > probe11 293 321 260 444 618 408 > probe12 171 167 169 49 75 72 > probe13 198 197 307 40 67 50 > probe14 247 265 348 53 60 62 > > probe1 533 519 294 507 739 404 > probe2 1789 1271 1468 1430 1666 1552 > probe3 56 66 59 51 45 48 > probe4 49 52 64 47 45 47 > probe5 54 47 33 49 65 55 > probe6 84 72 90 53 92 71 > probe7 73 72 65 40 53 54 > probe8 83 108 115 81 111 94 > probe9 49 56 43 52 41 53 > probe10 56 46 62 68 77 57 > probe11 54 83 55 47 64 46 > probe12 106 98 76 52 66 53 > probe13 43 48 37 36 52 39 > probe14 94 92 99 43 43 49 > > When I calculate the average of these two tables for each > array I don't get > the same values as presented in the top table. > I would like to understand how from the values on the > last two > tables I > come to a summarized value I get. Even if I ignore the > MM values > completely, which I think it does, I still don't see how it > comes to these > values. The two probes (Nr. 1 and 11 of the PM values) are > strongly differ > from the rest of the probes for this probe-set. Are > they being > ignored in > the summarization? > > Thanks in advance > > Assa > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > <mailto:bioconductor at="" r-project.org=""> <mailto:bioconductor at="" r-project.org="">> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the > > archives:http://news.gmane.org/gmane.science.biology.inform atics.conductor > > > -- James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 12.0 years ago James W. MacDonald 65k

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 3 months ago

Germany

Hi Jim, I would like to thank you a lot for the deailed explanation. I think I understand now better, not completely, but better, the differences between the PM-MM calculation and the reason to use the RMA algorithm. cu Assa On Wed, Apr 18, 2012 at 16:10, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Assa, > > > On 4/18/2012 9:37 AM, Assa Yeroslaviz wrote: > >> Hi Jim, >> >> >> On Wed, Apr 18, 2012 at 15:10, James W. MacDonald <jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote: >> >> Hi Assa, >> >> >> On 4/18/2012 2:33 AM, Assa Yeroslaviz wrote: >> >> Hi Jim, >> >> >> On Tue, Apr 17, 2012 at 20:04, James W. MacDonald >> <jmacdon@uw.edu <mailto:jmacdon@uw.edu=""> <mailto:jmacdon@uw.edu>> <mailto:jmacdon@uw.edu>>> wrote: >> >> Hi Assa, >> >> >> On 4/17/2012 3:09 AM, Assa Yeroslaviz wrote: >> >> Hi Jim and bioC-Users, >> >> >> >> On Mon, Apr 16, 2012 at 17:54, James W. MacDonald >> <jmacdon@uw.edu <mailto:jmacdon@uw.edu=""> <mailto:jmacdon@uw.edu>> <mailto:jmacdon@uw.edu>> <mailto:jmacdon@uw.edu>> <mailto:jmacdon@uw.edu> >> >> >> <mailto:jmacdon@uw.edu <mailto:jmacdon@uw.edu="">>>> wrote: >> >> Hi Assa, >> >> >> On 4/16/2012 9:05 AM, Assa Yeroslaviz wrote: >> >> Hi everybody, >> >> I have a question about the behavior of the expresso >> command when >> extracting the raw data from an affyBatch. >> >> I wanted to evaluate the raw intensities values of a >> specific >> gene from my >> data set and tried to extract it like that: >> rawdata<- expresso(totalExpressionData, >> bg.correct=FALSE, >> normalize=FALSE, >> pmcorrect.method="pmonly", >> summary.method="avgdiff", >> verbose=TRUE) >> >> I've got the result I wanted: >> wt1 wt2 wt3 treat1 >> treat2 treat3 >> gene_at 125.5 101 123.5 52.5 63.5 >> 58 >> >> The problem was that i expected them to be the other >> way around. >> >> >> Why did you expect it to be the other way around? What >> exactly did >> you expect this call to expresso() to do? >> >> >> Sorry I didn't explain a bit more. The gene I was >> investigating is the 'white' gene in Drosophila, which sets >> the eye color of the flies. It was used in our >> experiment to >> test the flies if they have a certain >> knockout/mutant/treatment. By 'expect it to be the >> other way >> around' I meant that only the treated flies should'veh ad a >> high level of the white gene. In this data it is the >> other way >> around. >> We are not sure, if the error is in the data or maybe >> it was a >> error made in the lab who analyzed the arrays and made an >> erroneous identification symbols for the arrays. >> >> That was a little background information about the >> experiment. >> I hope it clarify the problem better. >> >> Since affy is primarily based on S4 methods, it can >> be a bit >> difficult to figure out what a given function is >> going to >> do, so I >> can understand you not knowing what this call to >> expresso() is >> going to end up doing. However, what you are doing >> is pretty >> weird, no? The avgdiff method implies pm-mm, so what >> do you >> expect >> to happen if you then specify pmonly? >> >> Given that the pmcorrect.method controls how we >> correct the PM >> probes, and there is a subtractmm option, one would >> normally >> assume that the 'difference' part of avgdiff might >> happen >> in that >> step. But you said not to compute that, so all you >> are left >> with >> is the 'avg' part of avgdiff. >> >> >> Now for the point of my command. To see if I did wrongly >> classify the experiments or it was given to me as such, I >> wanted to extract the raw data from the arrays. As I said >> before, I know one can extract the PM and MM values for >> each >> of the probes of a probe-set. What I really would like >> to have >> is _one value for each probe-set and array each_. >> >> >> For that reason I used the summarization of the values with >> the expresso command. I will try to answer your >> questions as I >> understand them. >> I used the pmonly correction method, because if I >> understood >> it the right way, this is what the RMA algorithm is >> using when >> running the normalization procedure on the data. Am I wrong >> about that? >> >> Which way is the best way to extract the 'real' raw values >> from the arrays? >> I know that asking such question get you never a >> straightforward answer, but a near one will be good :-) >> >> >> Why do you need a raw value from the array to see if you mixed >> samples? To put the question a different way, if normalized, >> background corrected expression values are good enough for >> doing >> statistics, why are they not good enough to determine if >> you mixed >> samples up? >> >> >> I basically wanted to show the raw values to the biologists in >> the lab. The problem was, that they suggested that I somehow >> might have switched the way of analyzing the data. I mean they >> thought, that instead of doing Treatment- Control I did >> Control-Treatment, which would have easily explained the >> results of the higher expression of this gene in the control. >> >> By showing them the raw data, I can give them a proof, that >> the switch wasn't made by me, but the arrays were switched, or >> that the gene is somehow truly down-regulated in the treatment. >> >> >> >> >> If my logical thinking is wrong, will this cmmand will >> present >> me with a more accurate results? >> rawdatasubtractmm <- expresso(totalExpressionData, >> bg.correct=FALSE, >> normalize=FALSE, >> >> pmcorrect.method="subtractmm", >> summary.method="avgdiff", >> verbose=TRUE) >> >> I will run it now and look for the differences. >> >> >> But let's set the logical assumptions aside and look >> at the >> actual >> code. Going through the code to expresso() is a bit like >> following >> Alice down the rabbit hole, so I will cut to the >> chase. In the >> end, you will be calling two pieces of code that will >> handle the >> pm adjustment and the summary statistic calculation. In >> your call >> to expresso() these will be (respectively): >> >> > pmcorrect.pmonly >> function (object) >> { >> return(pm(object)) >> } >> >> and >> >> > generateExprVal.method.avgdiff >> function (probes, ...) >> { >> list(exprs = apply(probes, 2, median), se.exprs = >> apply(probes, >> 2, sd)/sqrt(nrow(probes))) >> } >> >> So the pmonly will just give you the pm probes, and then >> avgdiff >> will give you the column medians of the pm data. >> Therefore you >> have basically told expresso() that you want the median >> value for >> the non-background corrected, unnormalized pm probes. >> >> >> In the vignette from affy it said to give back the >> average of >> the values, so I expected it to give me back the mean, >> not the >> median. May be this is why I didn't really understood the >> results. Now it make more sense. >> >> Was that your intention? >> >> >> As I said earlier, I wanted to extract the 'real' >> values for >> the probe-sets and I thought that extracting the PM values >> with no correction or normalization will be the best >> method to >> achieve it. >> I ran now several other possibilities: >> pm.correction summary.method gene wt1 wt2 >> wt3 treat1 treat2 treat3 >> mas mas gene_at 52.01982773 34.16503645 >> 46.87094917 4.081420059 9.505471139 3.538333392 >> pmonly avgdiff gene_at 125.5 101 123.5 >> 52.5 >> 63.5 58 >> pmonly medianpolish gene_at 6.862247387 >> 6.564551864 6.926547933 6.093298307 6.355810999 >> 6.105348101 <tel:6.105348101> >> >> subtractmm avgdiff gene_at 36 15 22 2 8 2.5 >> <tel:36%2015%2022%202%208%202.**5> >> >> <tel:36%20%20%20%2015%20%20%**20%2022%20%20%20%202%20%20%20%**>> 208%20%20%20%202.5> >> >> >> >> This are the results. >> The mas and the subtractmm gives me the differences >> between PM >> and MM. Though the results are quite different, both of >> them >> gives me the same behavior. the expression in higher in the >> wild-type. >> The medianpolish gives me the log2 values of the PM values. >> This results I don't understand, a I can clearly see a >> difference between the wt and the treatemnt, but the >> log-scale >> values show almost no difference at all. >> >> >> Here is where our interpretations of the raw data diverge. >> I only >> see a few probes in your raw data that have very different >> expression levels between the sample types. The only probes >> that >> look to be binding at any appreciable level are the first >> two, and >> there really isn't any difference between them. In addition, if >> you believe that MM probes measure background, it doesn't >> really >> look like you have much if any of this transcript being >> expressed >> in either sample. >> >> >> Sorry,but this here I don't understand. >> I didn't present any probes from my data, but just the results >> of _one gene_ after running the expresso command with >> _different options_. >> >> How can you say that: "I only see a few probes in your raw >> data that have very different expression levels between the >> sample types"? >> I mean after analyzing the data , I think you are also right, >> it just wonder me how could you have known that. >> >> >> What are the data you show just below? You said they were the PM >> and MM probe data for the gene in question. >> >> Oh sorry my mistake. >> But isn't it very strange to have such high expression intensities in the >> MM fraction (2nd group) and a lower intensities in the PM (1st group). >> Doesn't it suppose to be the other way around? >> > > It is supposed to be the other way around. But that is based on the > assumption that MM probes only measure background, and that there is no > noise in the system. Rafa Irizarry has shown in several papers that MM > probes actually do measure transcript abundance (just not as efficiently as > PM probes). And there is _always_ noise in the system, and at the low end > of the scale it becomes difficult to distinguish noise from signal, as you > see here. > > > >> If the MM fraction is so high, does it mean that the non-specific binding >> is so high. Is that a very good reason to ignore the MM completely or it is >> exactly the reason why one should consider running a MAS5 (and not RMA, as >> I am using) normalization on the data? >> > > There are two good reasons to ignore the MM probes. As noted above, they > do actually measure transcript abundance, so you are subtracting signal > when you use PM-MM. In addition, there is variance involved in any > measurement, and variances are additive. So the variance of PM-MM is > _higher_ than the variance of PM alone. If we estimate background based on > all probes (as in RMA), the variance of the background measure will be much > lower than each individual MM probe, so when we do PM-bg, the resulting > variance is much lower than the PM-MM. > > An overall background measure will be biased however, so there is an > argument that the background should be estimated in smaller bins of probes, > one such way being gcrma(). Another way is to estimate variance using > different estimators, depending on the intensity, which is what vsnrma() > does. > > > >> >> I was just wondering here, why the medianpolish option shows >> basically no differences at all, while the others three with >> the mas or subtractmm options shows such a huge difference. >> Can it be explained? >> >> >> First, these aren't huge differences. Sure, if you do fold changes >> they are, but remember that these data range from 0 to 2^14 or so >> (hypothetically 2^16, but it is rare to get much over 2^14). A >> difference of 40 at the very low end of that range is >> indistinguishable from noise. Second, both mas5 and avgdiff suffer >> from the same problems; they assume MM probes accurately measure >> just background (not true), and they ignore probe-specific >> binding, which adds nuisance data to the result. Once you account >> for probe-specific binding, the differences largely disappear. >> >> >> Is there a certain threshold, were you can say that the gene is present >> or absent? You said that only the top tow rows bind by an appreciable >> level. But what about probes 11-14? They are higher than average and also >> show a higher intensity in the PM (top) than the MM (bottom). >> > > There is a difference between presence of a gene transcript (the genes are > always present; we are attempting to measure the transcript abundance) and > the intensity of the signal. You cannot look at the intensity of a signal > and say definitively that the transcript was or was not there. One could > imagine a scenario where the transcript was actually expressed, but the > probes designed to interrogate the transcript were so poorly designed that > you hardly get any signal. Alternatively one could imagine a scenario where > the transcript of interest was not being expressed, but another similar > transcript was being expressed at a high level. If the similar transcript > could bind at a low (but non-zero) rate, then you could get some signal > even though the transcript of interest wasn't there. > > Also, remember that a 25-mer is really bad at binding to unique sequences, > and the binding affinity is highly correlated to the GC content. That is > why a probeset has 11 or so of these 25-mers. The idea is that in > aggregate, the signal from all 11 probes will be more accurate than the > individual probes themselves. By only looking at four probes in particular, > you are cherry picking your data to fulfill a preconceived notion of what > you think is happening, rather than looking at the data in aggregate > (summarized in a reasonable manner). > > To me, summarizing using rma() is a reasonable thing to do, and when you > do that, there is no evidence that the gene is expressed at different > levels in the two sample types. End of story. > > > wt1 wt2 wt3 treat1 treat2 treat3 >> PM >> *probe11 293 321 260 444 618 408 >> >> probe12 171 167 169 49 75 72 >> probe13 198 197 307 40 67 50 >> probe14 247 265 348 53 60 62* >> >> MM >> probe11 54 83 55 47 64 46 >> probe12 106 98 76 52 66 53 >> probe13 43 48 37 36 52 39 >> probe14 94 92 99 43 43 49 >> >> I ran a present-absent call (mas5calls) on these arrays to see whether or >> not these gene is considered to be present. It is not! >> > > PA calls are not measuring the expression of the gene. They tell you if > the PM probes are significantly different from the MM probes. There can be > any number of reasons that PM and MM probes have the same expression, only > one of which is no expression of the gene. > > Best, > > Jim > > > Can the reason for that be the fact, that it show a reasonable level of >> expression in only two from 14 probes in the probe-set? >> Is it because the MM fraction shows a much higher intensity than the PM? >> >> Thanks >> >> Assa >> >> >> Best, >> >> Jim >> >> >> >> Thanks, >> Assa >> >> >> >> Best, >> >> Jim >> >> >> >> I hope it make some more sense and I would be happy to hear >> some more suggestions/comments on that topic >> >> thanks a lot, >> Assa >> >> >> Best, >> >> Jim >> >> >> >> So I decided to look into the specific probe >> values of the >> probes for this >> probe-set. >> This are the values I've got from the PM and MM >> respectively: >> wt1 wt2 wt3 treat1 treat2 treat3 >> probe1 403 379 220 420 530 316 >> probe2 117 84 104 52 57 54 >> probe3 49 49 73 38 58 52 >> probe4 87 67 110 55 43 49 >> probe5 66 61 51 46 72 62 >> probe6 118 100 104 69 87 74 >> probe7 180 142 170 45 46 45 >> probe8 133 102 137 95 132 81 >> probe9 80 71 65 52 54 46 >> probe10 63 45 56 53 53 54 >> probe11 293 321 260 444 618 408 >> probe12 171 167 169 49 75 72 >> probe13 198 197 307 40 67 50 >> probe14 247 265 348 53 60 62 >> >> probe1 533 519 294 507 739 404 >> probe2 1789 1271 1468 1430 1666 >> 1552 >> probe3 56 66 59 51 45 48 >> probe4 49 52 64 47 45 47 >> probe5 54 47 33 49 65 55 >> probe6 84 72 90 53 92 71 >> probe7 73 72 65 40 53 54 >> probe8 83 108 115 81 111 94 >> probe9 49 56 43 52 41 53 >> probe10 56 46 62 68 77 57 >> probe11 54 83 55 47 64 46 >> probe12 106 98 76 52 66 53 >> probe13 43 48 37 36 52 39 >> probe14 94 92 99 43 43 49 >> >> When I calculate the average of these two tables >> for each >> array I don't get >> the same values as presented in the top table. >> I would like to understand how from the values >> on the >> last two >> tables I >> come to a summarized value I get. Even if I >> ignore the >> MM values >> completely, which I think it does, I still don't >> see how it >> comes to these >> values. The two probes (Nr. 1 and 11 of the PM >> values) are >> strongly differ >> from the rest of the probes for this probe- set. Are >> they being >> ignored in >> the summarization? >> >> Thanks in advance >> >> Assa >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org=""> >> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org=""> >> >> >> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org=""> >> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org="">> >> >> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org=""> >> <mailto:bioconductor@r-**project.org <bioconductor@r-project.org=""> >> >>> >> >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the >> archives:http://news.gmane.** >> org/gmane.science.biology.**informatics.conductor<http: news.gmane="" .org="" gmane.science.biology.informatics.conductor=""> >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > [[alternative HTML version deleted]]

ADD COMMENT • link 12.0 years ago Assa Yeroslaviz ★ 1.5k

Login before adding your answer.