lumi and plotHousekeepingGene
2
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 2.9 years ago
Fred Hutchinson Cancer Research Center,…
Hi, I have a set of Illumina arrays and have been playing with lumi a little. It seems really useful - thank you very much. I'm pretty new to array analysis, so I'm not sure if this is a bug or intended behavior, but here goes: I've found that plotHousekeepingGene behaves a little oddly with our data (for now, processed using lumiExpresso with default settings). I think the problem is caused by the following portion of the function, that subtracts the minimum control value from all the control datapoints before taking the log and plotting. What's the rationale for that subtraction? I might be missing something. if (logMode) { if (max(selControlData) > 50) { selControlData <- selControlData - min(selControlData) + 1 selControlData <- log2(selControlData) } ylab <- "Expression Amplitude (log2)" } In the plotHousekeepingGene of our data, one of the housekeeping genes looks quite bad, so initially I was concerned: that gene has a lot lower expression than the rest of them, and expression appears to vary a lot across the arrays. Expression is indeed low, but it doesn't really vary much across the arrays when I look at the normalized data myself, so in reality I don't think I need to worry too much (although I will be checking in with the biologists about whether that gene should be high or low in the cells they've assayed). Here's the control data that got plotted by plotHousekeepingGene, i.e. the control data after that subtraction of the minimum value (probe 101 is low, and varies widely across arrays). If there is a good rationale for the subtraction step, maybe I should actually be worried about this gene? array_1 array_2 array_3 array_4 array_5 array_6 101 4.307429 7.742815 8.274728 7.74685 0.00000 7.499049 102 14.052568 14.134073 14.201274 14.12779 14.14360 14.181657 103 14.667866 14.725973 14.759108 14.59437 14.53636 14.606914 104 13.095512 12.862831 12.729939 13.30600 13.08711 13.063597 105 13.768515 13.506642 14.023174 13.58535 13.93239 14.050444 106 13.313818 12.773840 12.792241 13.15706 13.29373 12.848232 107 13.916514 13.466714 13.714310 13.49404 13.82293 13.475049 But here's how the control data looks when I just take the log2 myself (probe 101 is somewhat low, but fairly constant across arrays, and not as low as the negative controls from the same arrays - their values tend to be around 6-7). array_1 array_2 array_3 array_4 array_5 array_6 101 8.21820 8.943101 9.198936 8.944858 8.124121 8.842036 102 14.07598 14.156210 14.222410 14.150025 14.165590 14.203080 103 14.68319 14.740698 14.773500 14.610494 14.553143 14.622898 104 13.14062 12.915693 12.787801 13.345073 13.132484 13.109700 105 13.79697 13.540697 14.047064 13.617617 13.957818 14.073891 106 13.35268 12.830000 12.847703 13.200316 13.333127 12.901621 107 13.94222 13.501713 13.743846 13.528393 13.850343 13.509849 Hope this is helpful... Thanks very much, Janet Young ------------------------------------------------------------------- Dr. Janet Young Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 1471 fax: (206) 667 6524 email: jayoung ...at... fhcrc.org -------------------------------------------------------------------
Cancer lumi Cancer lumi • 694 views
ADD COMMENT
0
Entering edit mode
Pan Du ▴ 440
@pan-du-4636
Last seen 8.1 years ago
Hi Janet The plotHousekeepingGene function plots the housekeeping gene data in the controlData slot, which is the raw data output by BeadStudio/GenomeStudio. The current implementation of lumi package does not update the controlData during the preprocessing. So even after normalization, the controlData still keeps the same. So plotHousekeepingGene and other related controlData functions are for the QC of the raw data, although some variations can be corrected by preprocessing. Hope this clarify your question. Pan On Wed, Jun 1, 2011 at 6:29 PM, Janet Young <jayoung@fhcrc.org> wrote: > Hi, > > I have a set of Illumina arrays and have been playing with lumi a little. > It seems really useful - thank you very much. > > I'm pretty new to array analysis, so I'm not sure if this is a bug or > intended behavior, but here goes: I've found that plotHousekeepingGene > behaves a little oddly with our data (for now, processed using lumiExpresso > with default settings). I think the problem is caused by the following > portion of the function, that subtracts the minimum control value from all > the control datapoints before taking the log and plotting. What's the > rationale for that subtraction? I might be missing something. > > if (logMode) { > if (max(selControlData) > 50) { > selControlData <- selControlData - min(selControlData) + > 1 > selControlData <- log2(selControlData) > } > ylab <- "Expression Amplitude (log2)" > } > > In the plotHousekeepingGene of our data, one of the housekeeping genes > looks quite bad, so initially I was concerned: that gene has a lot lower > expression than the rest of them, and expression appears to vary a lot > across the arrays. Expression is indeed low, but it doesn't really vary > much across the arrays when I look at the normalized data myself, so in > reality I don't think I need to worry too much (although I will be checking > in with the biologists about whether that gene should be high or low in the > cells they've assayed). > > Here's the control data that got plotted by plotHousekeepingGene, i.e. the > control data after that subtraction of the minimum value (probe 101 is low, > and varies widely across arrays). If there is a good rationale for the > subtraction step, maybe I should actually be worried about this gene? > > array_1 array_2 array_3 array_4 array_5 array_6 > 101 4.307429 7.742815 8.274728 7.74685 0.00000 7.499049 > 102 14.052568 14.134073 14.201274 14.12779 14.14360 14.181657 > 103 14.667866 14.725973 14.759108 14.59437 14.53636 14.606914 > 104 13.095512 12.862831 12.729939 13.30600 13.08711 13.063597 > 105 13.768515 13.506642 14.023174 13.58535 13.93239 14.050444 > 106 13.313818 12.773840 12.792241 13.15706 13.29373 12.848232 > 107 13.916514 13.466714 13.714310 13.49404 13.82293 13.475049 > > But here's how the control data looks when I just take the log2 myself > (probe 101 is somewhat low, but fairly constant across arrays, and not as > low as the negative controls from the same arrays - their values tend to be > around 6-7). > > array_1 array_2 array_3 array_4 array_5 array_6 > 101 8.21820 8.943101 9.198936 8.944858 8.124121 8.842036 > 102 14.07598 14.156210 14.222410 14.150025 14.165590 14.203080 > 103 14.68319 14.740698 14.773500 14.610494 14.553143 14.622898 > 104 13.14062 12.915693 12.787801 13.345073 13.132484 13.109700 > 105 13.79697 13.540697 14.047064 13.617617 13.957818 14.073891 > 106 13.35268 12.830000 12.847703 13.200316 13.333127 12.901621 > 107 13.94222 13.501713 13.743846 13.528393 13.850343 13.509849 > > > Hope this is helpful... > > Thanks very much, > > Janet Young > > > ------------------------------------------------------------------- > > Dr. Janet Young > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., C3-168, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 1471 fax: (206) 667 6524 > email: jayoung ...at... fhcrc.org > > > ------------------------------------------------------------------- > > > > > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Pan, Thanks for the reply. I think we're not talking about the same adjustments to the control data as one another. It does clarify things a little that these are raw numbers, but I'm still confused about this one section of the code for the plotHousekeepingGene function, where it does a subtraction on the control data: if (max(selControlData) > 50) { selControlData <- selControlData - min(selControlData) + 1 selControlData <- log2(selControlData) } That subtraction of the minimum value hasn't made much difference for most of the housekeeping genes in our data, but for a single gene it makes the probe look very variable across arrays, but in reality it doesn't change nearly as much as the plot implies. I've attached the two plots from our data to show what I mean (plot as given by the function, and plot if I just extract the controlData slot and plot log2s manually). The numbers in my first email are extracted from the controlData slot and processed in the same way as the plotHousekeepingGene processes the data for plotting. Hope my question makes more sense now! Janet On Jun 2, 2011, at 9:16 AM, Pan Du wrote: > Hi Janet > > The plotHousekeepingGene function plots the housekeeping gene data in the controlData slot, which is the raw data output by BeadStudio/GenomeStudio. The current implementation of lumi package does not update the controlData during the preprocessing. So even after normalization, the controlData still keeps the same. So plotHousekeepingGene and other related controlData functions are for the QC of the raw data, although some variations can be corrected by preprocessing. > > Hope this clarify your question. > > Pan > > On Wed, Jun 1, 2011 at 6:29 PM, Janet Young <jayoung at="" fhcrc.org=""> wrote: > Hi, > > I have a set of Illumina arrays and have been playing with lumi a little. It seems really useful - thank you very much. > > I'm pretty new to array analysis, so I'm not sure if this is a bug or intended behavior, but here goes: I've found that plotHousekeepingGene behaves a little oddly with our data (for now, processed using lumiExpresso with default settings). I think the problem is caused by the following portion of the function, that subtracts the minimum control value from all the control datapoints before taking the log and plotting. What's the rationale for that subtraction? I might be missing something. > > if (logMode) { > if (max(selControlData) > 50) { > selControlData <- selControlData - min(selControlData) + > 1 > selControlData <- log2(selControlData) > } > ylab <- "Expression Amplitude (log2)" > } > > In the plotHousekeepingGene of our data, one of the housekeeping genes looks quite bad, so initially I was concerned: that gene has a lot lower expression than the rest of them, and expression appears to vary a lot across the arrays. Expression is indeed low, but it doesn't really vary much across the arrays when I look at the normalized data myself, so in reality I don't think I need to worry too much (although I will be checking in with the biologists about whether that gene should be high or low in the cells they've assayed). > > Here's the control data that got plotted by plotHousekeepingGene, i.e. the control data after that subtraction of the minimum value (probe 101 is low, and varies widely across arrays). If there is a good rationale for the subtraction step, maybe I should actually be worried about this gene? > > array_1 array_2 array_3 array_4 array_5 array_6 > 101 4.307429 7.742815 8.274728 7.74685 0.00000 7.499049 > 102 14.052568 14.134073 14.201274 14.12779 14.14360 14.181657 > 103 14.667866 14.725973 14.759108 14.59437 14.53636 14.606914 > 104 13.095512 12.862831 12.729939 13.30600 13.08711 13.063597 > 105 13.768515 13.506642 14.023174 13.58535 13.93239 14.050444 > 106 13.313818 12.773840 12.792241 13.15706 13.29373 12.848232 > 107 13.916514 13.466714 13.714310 13.49404 13.82293 13.475049 > > But here's how the control data looks when I just take the log2 myself (probe 101 is somewhat low, but fairly constant across arrays, and not as low as the negative controls from the same arrays - their values tend to be around 6-7). > > array_1 array_2 array_3 array_4 array_5 array_6 > 101 8.21820 8.943101 9.198936 8.944858 8.124121 8.842036 > 102 14.07598 14.156210 14.222410 14.150025 14.165590 14.203080 > 103 14.68319 14.740698 14.773500 14.610494 14.553143 14.622898 > 104 13.14062 12.915693 12.787801 13.345073 13.132484 13.109700 > 105 13.79697 13.540697 14.047064 13.617617 13.957818 14.073891 > 106 13.35268 12.830000 12.847703 13.200316 13.333127 12.901621 > 107 13.94222 13.501713 13.743846 13.528393 13.850343 13.509849 > > > Hope this is helpful... > > Thanks very much, > > Janet Young > > > ------------------------------------------------------------------- > > Dr. Janet Young > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., C3-168, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 1471 fax: (206) 667 6524 > email: jayoung ...at... fhcrc.org > > > ------------------------------------------------------------------- > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: HousekeepingGeneProfiles.pdf Type: application/pdf Size: 22298 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20110603="" 04abc085="" attachment.pdf=""> -------------- next part -------------- A non-text attachment was scrubbed... Name: HousekeepingGeneProfilesWithoutSubtraction.pdf Type: application/pdf Size: 19660 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20110603="" 04abc085="" attachment-0001.pdf="">
ADD REPLY
0
Entering edit mode
Pan Du ▴ 440
@pan-du-4636
Last seen 8.1 years ago
Hi Janet The original purpose of subtracting "min(selControlData) + 1" is to avoid negative values in the log transformation. I need to add some checking code before doing this, if there is no negative values, then no need to do this adjustment. Thanks for reporting this! Pan On Fri, Jun 3, 2011 at 6:17 PM, Janet Young <jayoung@fhcrc.org> wrote: > Hi Pan, > > Thanks for the reply. > > I think we're not talking about the same adjustments to the control data as > one another. It does clarify things a little that these are raw numbers, > but I'm still confused about this one section of the code for the > plotHousekeepingGene function, where it does a subtraction on the control > data: > > if (max(selControlData) > 50) { > selControlData <- selControlData - min(selControlData) + > 1 > selControlData <- log2(selControlData) > } > > That subtraction of the minimum value hasn't made much difference for most > of the housekeeping genes in our data, but for a single gene it makes the > probe look very variable across arrays, but in reality it doesn't change > nearly as much as the plot implies. I've attached the two plots from our > data to show what I mean (plot as given by the function, and plot if I just > extract the controlData slot and plot log2s manually). The numbers in my > first email are extracted from the controlData slot and processed in the > same way as the plotHousekeepingGene processes the data for plotting. > > Hope my question makes more sense now! > > Janet > > > > > > On Jun 2, 2011, at 9:16 AM, Pan Du wrote: > > > Hi Janet > > > > The plotHousekeepingGene function plots the housekeeping gene data in the > controlData slot, which is the raw data output by BeadStudio/GenomeStudio. > The current implementation of lumi package does not update the controlData > during the preprocessing. So even after normalization, the controlData still > keeps the same. So plotHousekeepingGene and other related controlData > functions are for the QC of the raw data, although some variations can be > corrected by preprocessing. > > > > Hope this clarify your question. > > > > Pan > > > > On Wed, Jun 1, 2011 at 6:29 PM, Janet Young <jayoung@fhcrc.org> wrote: > > Hi, > > > > I have a set of Illumina arrays and have been playing with lumi a little. > It seems really useful - thank you very much. > > > > I'm pretty new to array analysis, so I'm not sure if this is a bug or > intended behavior, but here goes: I've found that plotHousekeepingGene > behaves a little oddly with our data (for now, processed using lumiExpresso > with default settings). I think the problem is caused by the following > portion of the function, that subtracts the minimum control value from all > the control datapoints before taking the log and plotting. What's the > rationale for that subtraction? I might be missing something. > > > > if (logMode) { > > if (max(selControlData) > 50) { > > selControlData <- selControlData - min(selControlData) + > > 1 > > selControlData <- log2(selControlData) > > } > > ylab <- "Expression Amplitude (log2)" > > } > > > > In the plotHousekeepingGene of our data, one of the housekeeping genes > looks quite bad, so initially I was concerned: that gene has a lot lower > expression than the rest of them, and expression appears to vary a lot > across the arrays. Expression is indeed low, but it doesn't really vary > much across the arrays when I look at the normalized data myself, so in > reality I don't think I need to worry too much (although I will be checking > in with the biologists about whether that gene should be high or low in the > cells they've assayed). > > > > Here's the control data that got plotted by plotHousekeepingGene, i.e. > the control data after that subtraction of the minimum value (probe 101 is > low, and varies widely across arrays). If there is a good rationale for the > subtraction step, maybe I should actually be worried about this gene? > > > > array_1 array_2 array_3 array_4 array_5 array_6 > > 101 4.307429 7.742815 8.274728 7.74685 0.00000 7.499049 > > 102 14.052568 14.134073 14.201274 14.12779 14.14360 14.181657 > > 103 14.667866 14.725973 14.759108 14.59437 14.53636 14.606914 > > 104 13.095512 12.862831 12.729939 13.30600 13.08711 13.063597 > > 105 13.768515 13.506642 14.023174 13.58535 13.93239 14.050444 > > 106 13.313818 12.773840 12.792241 13.15706 13.29373 12.848232 > > 107 13.916514 13.466714 13.714310 13.49404 13.82293 13.475049 > > > > But here's how the control data looks when I just take the log2 myself > (probe 101 is somewhat low, but fairly constant across arrays, and not as > low as the negative controls from the same arrays - their values tend to be > around 6-7). > > > > array_1 array_2 array_3 array_4 array_5 array_6 > > 101 8.21820 8.943101 9.198936 8.944858 8.124121 8.842036 > > 102 14.07598 14.156210 14.222410 14.150025 14.165590 14.203080 > > 103 14.68319 14.740698 14.773500 14.610494 14.553143 14.622898 > > 104 13.14062 12.915693 12.787801 13.345073 13.132484 13.109700 > > 105 13.79697 13.540697 14.047064 13.617617 13.957818 14.073891 > > 106 13.35268 12.830000 12.847703 13.200316 13.333127 12.901621 > > 107 13.94222 13.501713 13.743846 13.528393 13.850343 13.509849 > > > > > > Hope this is helpful... > > > > Thanks very much, > > > > Janet Young > > > > > > ------------------------------------------------------------------- > > > > Dr. Janet Young > > > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Avenue N., C3-168, > > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > > > tel: (206) 667 1471 fax: (206) 667 6524 > > email: jayoung ...at... fhcrc.org > > > > > > ------------------------------------------------------------------- > > > > > > > > > > > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 212 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6