Entering edit mode
Hi Paola,
First, note that you sent this to the wrong list. Bioc-devel is for
developers of BioC packages, not questions about how to use them. The
correct list is Bioc-help, where I have re-routed the thread.
On 5/8/2012 10:02 AM, Paola Sgad? wrote:
> HI all,
> I'm having some problem with microarray analysis. I am a biologist
not very good with R neither with statistics!
> I'm using Agilent 4x44 arrays and the Agi4x44Processed package. I
have basically to compare WT vs KO data. The microarray was done first
with 3 true biological replicates and later with 4 technical
replicates with a pool of RNAs.
> My design is the following:
>> targets
> FileName Treat GErep Subject Array Repl.
> 549_1_4.txt KO 2 genotype 1 KO1
> 550_1_4.txt KO 2 genotype 2 KO2
> 551_1_4.txt KO 2 genotype 3 KO3
> 549_1_3.txt WT 1 genotype 1 WT1
> 550_1_3.txt WT 1 genotype 2 WT2
> 551_1_3.txt WT 1 genotype 3 WT3
> 385_1_1.txt WT 3 genotype 4 WT4
> 385_1_2.txt KO 4 genotype 4 KO4
> 385_1_3.txt WT 3 genotype 4 WT4
> 385_1_4.txt KO 4 genotype 4 KO4
> 386_1_2.txt WT 3 genotype 5 WT4
> 386_1_3.txt KO 4 genotype 5 KO4
> 386_1_4.txt WT 3 genotype 5 WT4
> I performed normalization and filtering with the entire set of
arrays, but when I started the statistical analysis using ebayes with
limma I realized I could not treat biological (WT1,2,3-KO1,2,3) and
technical replicates (WT4-KO4) the same way.
> I tried to use the dupcor function, but it does not work with tech
and biol replicates in the same analysis. Is there a way to bypass the
problem?
> Thanks for your help, I really cannot find the way out....
There is a famous quote by Sir Ronald Fisher that may well apply here:
"To call in the statistician after the experiment is done may be no
more
than asking him to perform a post-mortem examination: he may be able
to
say what the experiment died of."
You are correct that you cannot use biological and technical
replicates
the same way. Nor can you treat single samples and pools the same way
(the pool itself is biologically 'smoothed', so the expected variance
will be lower for a pool than for a sample from a single subject).
So you have bad choices and worse choices. In order from bad to
indefensible:
1.) Exclude all pooled samples. This is bad because you just wasted
all
those arrays, but is the easiest to defend if you try to publish.
2.) Exclude all but one each of pooled WT and KO. This is bad for the
reason above, plus you are assuming that a pool is the same as a
single
sample. Sort of hard to explain in a paper as well.
3.) Use all the data, pretending that they are all biological
replicates. Really hard to defend, and the gain in power from
increasing
N will likely be offset by the fact that the signal from all those
pooled technical replicates won't actually be signal, but noise (any
differences between technical replicates cannot possibly be due to
biological differences, hence is only noise).
In the end you will have to validate any targets that arise from the
microarray experiment, so really what you are trying to do is minimize
spurious results that cause you to waste time running RT-PCR on genes
that aren't differentially expressed.
Best,
Jim
> Cheers
> Paola
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099