1
0
Entering edit mode
Natasha ▴ 440
@natasha-4640
Last seen 6.8 years ago
Dear List, Normally for Illumina arrays, instead of the functions given based in the limma user guide (e.g. neqc, read.ilmn etc.), I use: * read.delim - to load probe profile data and sample table control data respectively * perform bg correction using the negative control probes from the sample table control * filter data based on _"detection scores"_ * normalise data using the _"vsn2"_ function However, as I have just realised that these can be used I have some queries: 1. Will there be much difference between the quantile normalisation in the neqc function (as compared to vsn2 ?) 2. How does one interpret the boxplots for the various controls (apart from x$genes$Status=="regular")? * as the median/mean vary a lot * much more for my samples (than the example shown in the user guide) 3. When filtering: based on the help of read.ilmn * The "Detection" column appears to be detection p-value by default * What does one do if the output is different from the GenomeStudio and it gives a "Detection Score" instead?? o Would: expressed <- apply(y$other$Detection < 0.05,1,any) + change to: expressed <- apply(y$other$Detection > 0.95,1,any) 4. Also, I do not fully understand the estimation of probes expressed using the propexpr function * one of my samples A7 shows 0.0 (I see that the housekeeping gene intensity for this is ~ 200 whereas for others its 1000+), its a similar case for samples A11 and A12 o propexpr(x) o A1 A2 A7 A8 A3 A4 A11 A12 0.3380243 0.4066500 0.0000000 0.4232871 0.3131936 0.3819055 0.1934197 0.2036340 A5 A6 A9 A10 0.3363844 0.3476216 0.3445201 0.3834617 sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gdata_2.8.2 limma_3.8.2 loaded via a namespace (and not attached): [1] gtools_2.6.2 tools_2.13.0 Many Thanks, Natasha -- [[alternative HTML version deleted]]
probe limma a4 probe limma a4 • 836 views
0
Entering edit mode
@gordon-smyth
Last seen 15 minutes ago
WEHI, Melbourne, Australia
Hi Natasha, > Date: Fri, 15 Jul 2011 18:03:59 +0100 > From: Natasha Sahgal <nsahgal at="" well.ox.ac.uk=""> > To: bioconductor at r-project.org > Subject: [BioC] read.ilmn function query > > Dear List, > > Normally for Illumina arrays, instead of the functions given based in > the limma user guide (e.g. neqc, read.ilmn etc.), I use: > > * read.delim - to load probe profile data and sample table control > data respectively > * perform bg correction using the negative control probes from the > sample table control > * filter data based on _"detection scores"_ > * normalise data using the _"vsn2"_ function > > > However, as I have just realised that these can be used I have some queries: > > 1. Will there be much difference between the quantile normalisation > in the neqc function (as compared to vsn2 ?) The neqc() strategy is different from that of vsn, not only in terms of normalization, but also in terms of background corection and variance stabilization. The are some parallels however in the mathematical theory between normexp background correction and the vsn transformation. How different the practical results will be though, I don't know. We compared neqc() to vst and other strategies that have been proposed for Illumina BeadChip data in the literature, but vsn wasn't one of those. > 2. How does one interpret the boxplots for the various controls > (apart from x$genes$Status=="regular")? > * as the median/mean vary a lot > * much more for my samples (than the example shown in the user > guide) This is a property of your data. If the boxplots vary are lot, then there must be a lot of variability in your data. > 3. When filtering: based on the help of read.ilmn > * The "Detection" column appears to be detection p-value by > default > * What does one do if the output is different from the > GenomeStudio and it gives a "Detection Score" instead?? > o Would: expressed <- apply(y$other$Detection < 0.05,1,any) > + change to: expressed <- apply(y$other$Detection > > 0.95,1,any) Yes. > 4. Also, I do not fully understand the estimation of probes expressed > using the propexpr function > * one of my samples A7 shows 0.0 (I see that the housekeeping > gene intensity for this is ~ 200 whereas for others its > 1000+), its a similar case for samples A11 and A12 > o propexpr(x) > o A1 A2 A7 > A8 A3 A4 A11 A12 > 0.3380243 0.4066500 0.0000000 0.4232871 0.3131936 > 0.3819055 0.1934197 0.2036340 > A5 A6 A9 A10 > 0.3363844 0.3476216 0.3445201 0.3834617 This seems to flag a possible problem with your sample A7. The regular probes (the majority of them anyway) are no brighter than background probes. This could suggest a problem with the RNA extraction, for example, in this case. The proportion of expressed probes might not be truly zero, but the spread of intensities must be different from that usually seen for a good quality array. Best wishes Gordon > sessionInfo() > R version 2.13.0 (2011-04-13) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] gdata_2.8.2 limma_3.8.2 > > loaded via a namespace (and not attached): > [1] gtools_2.6.2 tools_2.13.0 > > Many Thanks, > Natasha > > > > -- ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
0
Entering edit mode
Hi Natasha, Just adding to Gordon's reply: the "detection" columns in the read.ilmn output are always the same with those in the GenomeStudio/BeadStudio output. read.ilmn function does not change the original detection p values or detection scores. Cheers, Wei On Jul 17, 2011, at 10:39 AM, Gordon K Smyth wrote: > Hi Natasha, > >> Date: Fri, 15 Jul 2011 18:03:59 +0100 >> From: Natasha Sahgal <nsahgal at="" well.ox.ac.uk=""> >> To: bioconductor at r-project.org >> Subject: [BioC] read.ilmn function query >> >> Dear List, >> >> Normally for Illumina arrays, instead of the functions given based in >> the limma user guide (e.g. neqc, read.ilmn etc.), I use: >> >> * read.delim - to load probe profile data and sample table control >> data respectively >> * perform bg correction using the negative control probes from the >> sample table control >> * filter data based on _"detection scores"_ >> * normalise data using the _"vsn2"_ function >> >> >> However, as I have just realised that these can be used I have some queries: >> >> 1. Will there be much difference between the quantile normalisation >> in the neqc function (as compared to vsn2 ?) > > The neqc() strategy is different from that of vsn, not only in terms of normalization, but also in terms of background corection and variance stabilization. The are some parallels however in the mathematical theory between normexp background correction and the vsn transformation. How different the practical results will be though, I don't know. We compared neqc() to vst and other strategies that have been proposed for Illumina BeadChip data in the literature, but vsn wasn't one of those. > >> 2. How does one interpret the boxplots for the various controls >> (apart from x$genes$Status=="regular")? >> * as the median/mean vary a lot >> * much more for my samples (than the example shown in the user >> guide) > > This is a property of your data. If the boxplots vary are lot, then there must be a lot of variability in your data. > >> 3. When filtering: based on the help of read.ilmn >> * The "Detection" column appears to be detection p-value by >> default >> * What does one do if the output is different from the >> GenomeStudio and it gives a "Detection Score" instead?? >> o Would: expressed <- apply(y$other$Detection < 0.05,1,any) >> + change to: expressed <- apply(y$other$Detection >> > 0.95,1,any) > > Yes. > >> 4. Also, I do not fully understand the estimation of probes expressed >> using the propexpr function >> * one of my samples A7 shows 0.0 (I see that the housekeeping >> gene intensity for this is ~ 200 whereas for others its >> 1000+), its a similar case for samples A11 and A12 >> o propexpr(x) >> o A1 A2 A7 >> A8 A3 A4 A11 A12 >> 0.3380243 0.4066500 0.0000000 0.4232871 0.3131936 >> 0.3819055 0.1934197 0.2036340 >> A5 A6 A9 A10 >> 0.3363844 0.3476216 0.3445201 0.3834617 > > This seems to flag a possible problem with your sample A7. The regular probes (the majority of them anyway) are no brighter than background probes. This could suggest a problem with the RNA extraction, for example, in this case. The proportion of expressed probes might not be truly zero, but the spread of intensities must be different from that usually seen for a good quality array. > > Best wishes > Gordon > >> sessionInfo() >> R version 2.13.0 (2011-04-13) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] gdata_2.8.2 limma_3.8.2 >> >> loaded via a namespace (and not attached): >> [1] gtools_2.6.2 tools_2.13.0 >> >> Many Thanks, >> Natasha >> >> >> >> -- ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}