Dear all,
I am working on an Affymetrix time series data set with high
percentages
(30-40%) and mostly downregulated differentials.
In a previous discussion regarding the question of a suitable
normalization
strategy for such data sets Wolfgang Huber highly recommended to
"narrow
down the probes from which you fit the parameters from all genes
(incl. the
differential ones) to a subset which are enriched for non-changing."
In this context I have two questions:
1) What is the minimum number of genes/probes that should be used for
VSN
parameter estimation? I could extract a list of some hundred 'stable'
or
'low variability' genes from previous microarray studies. Would this
number
be sufficient or do I need bigger probe subsets (thousands of probes,
1/2
of all probes, etc.)?
2) Is there a straight foward way to implement this into standard R
packages offerring VSN? In other words, if I perform a VSN parameter
estimation on my gene/probe subset, how (in R terms) would I
subsequently
apply this to the whole dataset?(My apologies if this is trivial, my
programming skills are still rather a disgrace :) )
Any comment on these questions would be highly appreciated.
Kind regards,
Stefan
--
Dr. Stefan Thomsen
Research Associate
Department of Zoology
University of Cambridge
Downing Street
Cambridge CB2 3EJ
Tel.: +44 1223 336623
Fax: +44 1223 336679
stt26 at cam.ac.uk
Hi Stefan,
0) vsn already has an algorithm that attempts to narrow down the
probes
that are used. This is the so-called "robustification" of the ML
estimator by Least Trimmed Sum of Squared minimisation. But of course
this is automatic, and sometimes not perfect, and if you have an
external way of identifi?ing non-changing probes, that can be very
useful.
1) A few hundred should be OK in practice. What is more important than
their number is that they about equally cover the whole dynamic range!
2)
x = an ExpressionSet
fit = vsn2(x[yourSelectedProbes, ])
nx = predict(fit, newdata=exprs(x))
see also the man page
method?predict("vsn")
(Please use latest release version.)
Hope this helps
Wolfgang
> Dear all,
>
> I am working on an Affymetrix time series data set with high
percentages
> (30-40%) and mostly downregulated differentials.
>
> In a previous discussion regarding the question of a suitable
normalization
> strategy for such data sets Wolfgang Huber highly recommended to
"narrow
> down the probes from which you fit the parameters from all genes
(incl. the
> differential ones) to a subset which are enriched for non-changing."
>
> In this context I have two questions:
>
> 1) What is the minimum number of genes/probes that should be used
for VSN
> parameter estimation? I could extract a list of some hundred
'stable' or
> 'low variability' genes from previous microarray studies. Would this
number
> be sufficient or do I need bigger probe subsets (thousands of
probes, 1/2
> of all probes, etc.)?
>
> 2) Is there a straight foward way to implement this into standard R
> packages offerring VSN? In other words, if I perform a VSN parameter
> estimation on my gene/probe subset, how (in R terms) would I
subsequently
> apply this to the whole dataset?(My apologies if this is trivial, my
> programming skills are still rather a disgrace :) )
>
> Any comment on these questions would be highly appreciated.
>
> Kind regards,
>
> Stefan
>