yet another question on technical replicates...
1
0
Entering edit mode
@giorgi-elena-1632
Last seen 10.2 years ago
Dear Board, I know this topic has been discussed several times, yet I'm still confused on how to come up with the right design matrix when technical replicates are present, especially when dealing with affy arrays. For one thing, I know that when we have the same number of tech reps per biological sample, then we can proceed and use the duplicate correlation function, correct? On the other hand, when this is not the case, what's the best strategy to use? Averaging is not recommended, yet if we have 2-3 arrays per sample, we don't have enough degrees of freedom to be able to include the technical replication effect, isn't this so? One example that came up in our lab was an affy experiment with two cell-lines; for group1 we had 5 arrays, one biological replicate each, and for group2 we had 4 arrays, 2 tech reps from one sample and 2 tech reps from a different sample. We used the following design matrix: 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 And, in order to test the differences between the two groups, the following contrast: c(-1, 0.5, 0.5). Does this sound like a reasonable approach? In general, should we include a different column in the design matrix for each tech rep group and average the contrast coefficients accordingly? Or is this just equivalent to averaging the tech reps? Thanks so much, Elena "EMF <coh.org>" made the following annotations. ---------------------------------------------------------------------- -------- SECURITY/CONFIDENTIALITY WARNING: This message and any atta...{{dropped}}
affy affy • 903 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 20 hours ago
United States
Hi Elena, Giorgi, Elena wrote: > Dear Board, > > I know this topic has been discussed several times, yet I'm still > confused on how to come up with the right design matrix when technical > replicates are present, especially when dealing with affy arrays. > > For one thing, I know that when we have the same number of tech reps per > biological sample, then we can proceed and use the duplicate correlation > function, correct? > > On the other hand, when this is not the case, what's the best strategy > to use? Averaging is not recommended, yet if we have 2-3 arrays per > sample, we don't have enough degrees of freedom to be able to include > the technical replication effect, isn't this so? > > One example that came up in our lab was an affy experiment with two > cell-lines; for group1 we had 5 arrays, one biological replicate each, > and for group2 we had 4 arrays, 2 tech reps from one sample and 2 tech > reps from a different sample. > > We used the following design matrix: > > 1 0 0 > 1 0 0 > 1 0 0 > 1 0 0 > 1 0 0 > 0 1 0 > 0 1 0 > 0 0 1 > 0 0 1 > > And, in order to test the differences between the two groups, the > following contrast: c(-1, 0.5, 0.5). > > Does this sound like a reasonable approach? In general, should we > include a different column in the design matrix for each tech rep group > and average the contrast coefficients accordingly? Or is this just > equivalent to averaging the tech reps? It is pretty much equivalent to averaging the tech reps. The denominator of the t-statistic you will be computing may be slightly different, but overall I don't think there will be much difference. With these data you are going to have to violate some assumptions in order to analyze them the way you want. When you fit a linear model to these data without using the batch argument and calculating the intra-batch correlation you are assuming that all the samples are independent (among other things), which is obviously not true since some are technical replicates. This will likely result in a variance estimate that is smaller than it should be, which may result in more 'significant' genes than you should really see. The other alternative is to average the technical replicates from the start and then fit the model. Again, the variance estimate will be off because in the case of the tech replicates you will be calculating based on means, which are much less variable than the underlying data. As above, you will likely have more significant genes than if you weren't violating assumptions. In Statistics we sometimes have to fit a model knowing that we are violating one or more of the underlying assumptions. The trick is to know that you are violating assumptions, and to understand what that means for your results. HTH, Jim > > Thanks so much, > Elena > > > "EMF <coh.org>" made the following annotations. > -------------------------------------------------------------------- ---------- > SECURITY/CONFIDENTIALITY WARNING: This message and any atta...{{dropped}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
ADD COMMENT

Login before adding your answer.

Traffic: 577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6