Entering edit mode
Hi Nhan,
GAGE actually handles missing values pretty well. Here is a brief
description from my previous answer to a similar question.
Before gene set differential expression tests, we calculate
differential expression statistics (fold change or signal-to-noise
ratio etc) for each gene. While other methods do group-on-group
comparison (compare the whole experiment sample group vs the whole
control group) in this step, hence missing value in any sample(s) may
affect the calculation of per gene differential expression statistics.
GAGE compares one experimental sample to one control sample at a time.
Hence any missing expression value will produce NA fold change in that
particular pair-wise comparison only but does not affect fold changes
in other pair-wise comparisons. Meanwhile, for any particular pair-
wise comparison, the produced NA fold change will be omitted in the
gene set test hence will usually make little difference as long as we
have enough effective genes in a gene set.
I have done an experiment on missing values before. With 50% of all
genes in a microarray dataset being randomly removed (replaced by NA),
GAGE results were largely unaffected. You may try it out with your own
data. Hope that helps.
Weijun
--- On Wed, 2/2/11, Nhan Thi Ho <nho at="" epi.msu.edu=""> wrote:
> From: Nhan Thi Ho <nho at="" epi.msu.edu="">
> Subject: How GAGE handles missing values?
> To: "Luo Weijun" <luo_weijun at="" yahoo.com="">
> Date: Wednesday, February 2, 2011, 9:25 AM
> Dear Dr Lou,
> We are facing another issue when using GAGE. As some of our
> arrays have some artifacts, we treat the regions with
> artifacts as missing values (just removing the values in the
> regions with artifacts and treat them as NA. We also did
> some imputation for the regions with artifacts but at this
> moment, we decide just to treat them as NA).
> If we analyze cases vs. control as groups, missing values
> may be less problematic. But it is not what we want to do
> because our data are in matched pairs with a huge variation
> in sample storage time among pairs.
> However, if we analyze data as pairs, if only one of the
> two arrays in a pair have mising values, that pair get
> affected. For a subset of our samples, many pairs get
> affected that way.
> I have been looking into the GAGE manual and I have not
> found how GAGE handles missing values.
> Could you please help me our with this?
> Thank you very much.
> Nhan
>
> Nhan Thi HO, MD
> PhD Student
> Dept of Epidemiology
> Michigan State University,
> B601 West Fee Hall,
> East Lansing, 48824 MI, USA
> Office Phone: 517- 363 8263 ext 111
> Hand Phone: 517- 599 8775
> Email: nho at epi.msu.edu