Dear List,
Normally for Illumina arrays, instead of the functions given based in
the limma user guide (e.g. neqc, read.ilmn etc.), I use:
* read.delim - to load probe profile data and sample table control
data respectively
* perform bg correction using the negative control probes from the
sample table control
* filter data based on _"detection scores"_
* normalise data using the _"vsn2"_ function
However, as I have just realised that these can be used I have some
queries:
1. Will there be much difference between the quantile normalisation
in the neqc function (as compared to vsn2 ?)
2. How does one interpret the boxplots for the various controls
(apart from x$genes$Status=="regular")?
* as the median/mean vary a lot
* much more for my samples (than the example shown in the
user
guide)
3. When filtering: based on the help of read.ilmn
* The "Detection" column appears to be detection p-value by
default
* What does one do if the output is different from the
GenomeStudio and it gives a "Detection Score" instead??
o Would: expressed <- apply(y$other$Detection <
0.05,1,any)
+ change to: expressed <-
apply(y$other$Detection
> 0.95,1,any)
4. Also, I do not fully understand the estimation of probes
expressed
using the propexpr function
* one of my samples A7 shows 0.0 (I see that the
housekeeping
gene intensity for this is ~ 200 whereas for others its
1000+), its a similar case for samples A11 and A12
o propexpr(x)
o A1 A2 A7
A8 A3 A4 A11
A12
0.3380243 0.4066500 0.0000000 0.4232871 0.3131936
0.3819055 0.1934197 0.2036340
A5 A6 A9
A10
0.3363844 0.3476216 0.3445201 0.3834617
sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gdata_2.8.2 limma_3.8.2
loaded via a namespace (and not attached):
[1] gtools_2.6.2 tools_2.13.0
Many Thanks,
Natasha
--
[[alternative HTML version deleted]]
Hi Natasha,
> Date: Fri, 15 Jul 2011 18:03:59 +0100
> From: Natasha Sahgal <nsahgal at="" well.ox.ac.uk="">
> To: bioconductor at r-project.org
> Subject: [BioC] read.ilmn function query
>
> Dear List,
>
> Normally for Illumina arrays, instead of the functions given based
in
> the limma user guide (e.g. neqc, read.ilmn etc.), I use:
>
> * read.delim - to load probe profile data and sample table
control
> data respectively
> * perform bg correction using the negative control probes from
the
> sample table control
> * filter data based on _"detection scores"_
> * normalise data using the _"vsn2"_ function
>
>
> However, as I have just realised that these can be used I have some
queries:
>
> 1. Will there be much difference between the quantile
normalisation
> in the neqc function (as compared to vsn2 ?)
The neqc() strategy is different from that of vsn, not only in terms
of
normalization, but also in terms of background corection and variance
stabilization. The are some parallels however in the mathematical
theory
between normexp background correction and the vsn transformation. How
different the practical results will be though, I don't know. We
compared
neqc() to vst and other strategies that have been proposed for
Illumina
BeadChip data in the literature, but vsn wasn't one of those.
> 2. How does one interpret the boxplots for the various controls
> (apart from x$genes$Status=="regular")?
> * as the median/mean vary a lot
> * much more for my samples (than the example shown in the
user
> guide)
This is a property of your data. If the boxplots vary are lot, then
there
must be a lot of variability in your data.
> 3. When filtering: based on the help of read.ilmn
> * The "Detection" column appears to be detection p-value by
> default
> * What does one do if the output is different from the
> GenomeStudio and it gives a "Detection Score" instead??
> o Would: expressed <- apply(y$other$Detection <
0.05,1,any)
> + change to: expressed <-
apply(y$other$Detection
> > 0.95,1,any)
Yes.
> 4. Also, I do not fully understand the estimation of probes
expressed
> using the propexpr function
> * one of my samples A7 shows 0.0 (I see that the
housekeeping
> gene intensity for this is ~ 200 whereas for others its
> 1000+), its a similar case for samples A11 and A12
> o propexpr(x)
> o A1 A2 A7
> A8 A3 A4 A11
A12
> 0.3380243 0.4066500 0.0000000 0.4232871 0.3131936
> 0.3819055 0.1934197 0.2036340
> A5 A6 A9
A10
> 0.3363844 0.3476216 0.3445201 0.3834617
This seems to flag a possible problem with your sample A7. The
regular
probes (the majority of them anyway) are no brighter than background
probes. This could suggest a problem with the RNA extraction, for
example, in this case. The proportion of expressed probes might not
be
truly zero, but the spread of intensities must be different from that
usually seen for a good quality array.
Best wishes
Gordon
> sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] gdata_2.8.2 limma_3.8.2
>
> loaded via a namespace (and not attached):
> [1] gtools_2.6.2 tools_2.13.0
>
> Many Thanks,
> Natasha
>
>
>
> --
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
Hi Natasha,
Just adding to Gordon's reply: the "detection" columns in the
read.ilmn output are always the same with those in the
GenomeStudio/BeadStudio output. read.ilmn function does not change the
original detection p values or detection scores.
Cheers,
Wei
On Jul 17, 2011, at 10:39 AM, Gordon K Smyth wrote:
> Hi Natasha,
>
>> Date: Fri, 15 Jul 2011 18:03:59 +0100
>> From: Natasha Sahgal <nsahgal at="" well.ox.ac.uk="">
>> To: bioconductor at r-project.org
>> Subject: [BioC] read.ilmn function query
>>
>> Dear List,
>>
>> Normally for Illumina arrays, instead of the functions given based
in
>> the limma user guide (e.g. neqc, read.ilmn etc.), I use:
>>
>> * read.delim - to load probe profile data and sample table
control
>> data respectively
>> * perform bg correction using the negative control probes from
the
>> sample table control
>> * filter data based on _"detection scores"_
>> * normalise data using the _"vsn2"_ function
>>
>>
>> However, as I have just realised that these can be used I have some
queries:
>>
>> 1. Will there be much difference between the quantile
normalisation
>> in the neqc function (as compared to vsn2 ?)
>
> The neqc() strategy is different from that of vsn, not only in terms
of normalization, but also in terms of background corection and
variance stabilization. The are some parallels however in the
mathematical theory between normexp background correction and the vsn
transformation. How different the practical results will be though, I
don't know. We compared neqc() to vst and other strategies that have
been proposed for Illumina BeadChip data in the literature, but vsn
wasn't one of those.
>
>> 2. How does one interpret the boxplots for the various controls
>> (apart from x$genes$Status=="regular")?
>> * as the median/mean vary a lot
>> * much more for my samples (than the example shown in the
user
>> guide)
>
> This is a property of your data. If the boxplots vary are lot, then
there must be a lot of variability in your data.
>
>> 3. When filtering: based on the help of read.ilmn
>> * The "Detection" column appears to be detection p-value by
>> default
>> * What does one do if the output is different from the
>> GenomeStudio and it gives a "Detection Score" instead??
>> o Would: expressed <- apply(y$other$Detection <
0.05,1,any)
>> + change to: expressed <-
apply(y$other$Detection
>> > 0.95,1,any)
>
> Yes.
>
>> 4. Also, I do not fully understand the estimation of probes
expressed
>> using the propexpr function
>> * one of my samples A7 shows 0.0 (I see that the
housekeeping
>> gene intensity for this is ~ 200 whereas for others its
>> 1000+), its a similar case for samples A11 and A12
>> o propexpr(x)
>> o A1 A2 A7
>> A8 A3 A4 A11
A12
>> 0.3380243 0.4066500 0.0000000 0.4232871 0.3131936
>> 0.3819055 0.1934197 0.2036340
>> A5 A6 A9
A10
>> 0.3363844 0.3476216 0.3445201 0.3834617
>
> This seems to flag a possible problem with your sample A7. The
regular probes (the majority of them anyway) are no brighter than
background probes. This could suggest a problem with the RNA
extraction, for example, in this case. The proportion of expressed
probes might not be truly zero, but the spread of intensities must be
different from that usually seen for a good quality array.
>
> Best wishes
> Gordon
>
>> sessionInfo()
>> R version 2.13.0 (2011-04-13)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
>> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods
base
>>
>> other attached packages:
>> [1] gdata_2.8.2 limma_3.8.2
>>
>> loaded via a namespace (and not attached):
>> [1] gtools_2.6.2 tools_2.13.0
>>
>> Many Thanks,
>> Natasha
>>
>>
>>
>> --
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}