Entering edit mode
Albyn Jones
▴
70
@albyn-jones-3850
Last seen 10.3 years ago
Dear BioConductor Folk
The help file for contrasts.fit states:
"Warning. For efficiency reasons, this function does not
re-factorize the design matrix for each probe. A consequence is
that, if the design matrix is non-orthogonal and the original fit
included quality weights or missing values, then the unscaled
standard deviations produced by this function are approximate
rather than exact. The approximation is usually acceptable...."
My attention was attracted to the statement when a colleague in
biology asked me why one would get different sets of probes identified
as differentially expressed, depending on which individual or
biological sample was selected as the reference in a balanced loop
design.
My experience, admittedly limited, suggests that the computational
efficiency gain is not worth the loss of accuracy. Even if one has to
sacrifice the efficiency of a single pass through the raw data, at
least one gets correct results. I have hacked a version of lmFit to
evaluate contrasts with standard errors based on the exact covariance
matrix. It runs esssentially as quickly as lmFit, so I find the
efficiency argument uncompelling.
A search of the archive produced several discussions of missing values
in limma. The main argument I see is Gordon Smyth's (Date:
2008-03-08)
"The ideal solution is not to introduce missing values into your
data in the first place. In my experimence, missing values are
almost always avoidable. I have never seen a situation where it
was necessary or desirable to introduce a large proportion of
missing values."
My colleagues in biology report that they inspect their arrays
visually and note probes which have been scratched, probes covered by
background blobs and the like. These categories seem to satisfy the
missing-at-random criterion: the probe is marked NA not because it is
saturated or below background, but because it was unreadable for
reasons unrelated to the response.
I'd appreciate feedback: has anyone else already done this? Would
others find this useful? Are there objections I have overlooked?
albyn