On 9/23/2010 4:45 PM, Jack Luo wrote:
> This is a conceptual question related to microarray, instead of the
> any Bioconductor package. I apologize if this bothers anyone.
> I am struggling to understand the concept of differential expression
> terms of its resources (whether it is technical or biological).
> have an experiment with two groups (healthy vs. disease) and try to
> some differentially expressed genes, take two genes for example,
> them are differentially expressed (DE) between healthy and disease.
> Gene A has present detection call for all the samples under study
> detection call p-value in the healthy group is in the order of 1e-2
> the detection call p-value in the disease group is much more
> (say, 1e-10)).
> Gene B has 50% present call in healthy while 100% present call in
First let's backtrack and talk about P/M/A calls, and what they mean.
The statistics underlying these calls are testing whether or not the
probes in aggregate appear to be different than the corresponding MM
probes in a given probeset. Others will disagree, but I think it is
incorrect to assume that an absent call means that the transcript
measured is absent. What it really means is that we cannot say that
PM probes are binding more transcript than the MM probes.
If you make the assumption that the MM probes do a good job of
background, then the absent call really means it is absent. However, a
large percentage of MM probes have higher fluorescence readings than
corresponding PM probe (it varies by chip, but is usually > 30%. You
check with your data to verify). In addition, the MM probe intensity
will increase with increasing amounts of transcript. These are two of
the reasons that Affy has abandoned the use of MM probes (more real
estate on the chip being a third), and why very few people use MAS5
computing expression values any more.
So I would personally caution you against interpreting these p-values
indicating presence or absence of the transcript.
As to your question, technical and biological variability are
confounded here, so you have to set up your experiments in such a way
that the contribution from technical variability is minimized. For
instance, if you do all controls one day and diseased the next, you
cannot possibly tell if any differences were due to biology or to
technical differences. However, if you randomize sample types over
processed, then the technical variability (which still exists, and is
confounded with biological variability), will tend to appear as noise,
and be captured by the residual term.
Also, in my opinion there isn't any difference between the two
situations (assuming I understand situation B correctly). What I think
you are asking is this; are there any substantive differences between
situation where a gene is apparently unexpressed in sample A but
expressed to a certain degree in sample B and a situation where a gene
is expressed in both samples, but at a two fold (or greater) level in
In my opinion, there is no difference between those scenarios. In each
situation, the gene is expressed at a much lower level in one sample
versus the other. The relative levels are unimportant, as the absolute
accuracy of our measuring device is not good.
> My question is what's the correct interpretation in terms of whether
> differential expression is due to technical or biological? Are they
> due to technical, or A is DE due to biological and B is due to
> they are both DE due to biological?
> Thanks a bunch,
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives:
James W. MacDonald, M.S.
University of Michigan
Department of Human Genetics
1241 E. Catherine St.
Ann Arbor MI 48109-5618
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues