2x3 factorial experiment

0

Entering edit mode

Ilario De Toma ▴ 10

@ilario-de-toma-5961

Last seen 7.0 years ago

Italy

I have performed a microarray experiment comparing primary monocyte from mouse fetal livers. I have three KO samples vs. three WT samples (biological replicates). Each sample have been splitted into three plates: - 1/3 (P) have been collected while proliferating (bacteria plate with GM-CSF) -1/3 (U) have been let untreated -1/3 (LPS) have been treated with LPS Here is a summary of my experiment: Status treat liver WT1.P WT U 1 KO1.P KO U A WT2.P WT U 2 KO2.P KO U B WT3.P WT U 3 KO3.P KO U C WT1.U WT U 1 KO1.U KO U A WT2.U WT U 2 KO2.U KO U B WT3.U WT U 3 KO3.U KO U C WT1.LPS WT LPS 1 KO1.LPS KO LPS A WT2.LPS WT LPS 2 KO2.LPS KO LPS B WT3.LPS WT LPS 3 KO3.LPS KO LPS C The main contrasts that i want to analyze are: -the difference in basal levels of proliferating cells (KO.P-WT.P) - the difference between genes activated by LPS in KO vs WT: (KO.LPS-KO.U) - (WT.LPS -WT.U) I did an MDSplot on my samples and they are perfectly separated: the first dimension separates LPS vs. U or P; while the second dimension separates Wt vs. KO. Moreover, between WT and KO groups also U and P are separated along that dimension. So, now I have some statistical question when fitting a linear model with limma: 1)Is it correct to fit the linear model considering all the samples? Even though for example LPS treated sample are completely different from untreated? 2)Should I consider the experiment with proliferating cells (P) a separate one and fit an independent linear model (considering that I am not particularly interested in the differences U-P or LPS - P. 3) Are the statistics influenced by the number of contrasts you investigate when calling the "contrasts.fit" and "eBayes" functions? 3) When calling lmFit should I "block" for fetal liver origin? (considering that I split each fetal liver in three plates forming P, U and LPS) even though cells do not cluster for fetal liver origin). To that purpose first I calculated the correlation between samples originating from the same fetal liver (with duplicateCorrelation) and it was "0.15", afterwards I used lmFit putting the block and correlation variables into the formula, but when I used "contrasts.fit" I had the following error: Error in contrasts.fit(fit2, contrasts2) : Number of rows of contrast matrix must match number of coefficients In addition: Warning message: In contrasts.fit(fit2, contrasts2) : row names of contrasts don't match col names of coefficients Sorry for all these questions, initially I treated my experiment as a 2x3 factorial, after I conducted two separates analysis one for proliferating cells and another one as a 2x2 factorial, but now I am not sure on my statistics. I hope to have been clear. Ilario, PhD Student [[alternative HTML version deleted]]

Microarray Microarray • 1.3k views

ADD COMMENT • link updated 10.9 years ago by James W. MacDonald 65k • written 10.9 years ago by Ilario De Toma ▴ 10

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

Hi Ilario, On 5/28/2013 12:49 PM, Ilario De Toma wrote: > I have performed a microarray experiment comparing primary monocyte from > mouse fetal livers. > > I have three KO samples vs. three WT samples (biological replicates). Each > sample have been splitted into three plates: > - 1/3 (P) have been collected while proliferating (bacteria plate with > GM-CSF) > -1/3 (U) have been let untreated > -1/3 (LPS) have been treated with LPS > > Here is a summary of my experiment: > > Status treat liver > WT1.P WT U 1 > KO1.P KO U A > WT2.P WT U 2 > KO2.P KO U B > WT3.P WT U 3 > KO3.P KO U C > WT1.U WT U 1 > KO1.U KO U A > WT2.U WT U 2 > KO2.U KO U B > WT3.U WT U 3 > KO3.U KO U C > WT1.LPS WT LPS 1 > KO1.LPS KO LPS A > WT2.LPS WT LPS 2 > KO2.LPS KO LPS B > WT3.LPS WT LPS 3 > KO3.LPS KO LPS C > > The main contrasts that i want to analyze are: > -the difference in basal levels of proliferating cells (KO.P-WT.P) > - the difference between genes activated by LPS in KO vs WT: > (KO.LPS-KO.U) - (WT.LPS -WT.U) > > I did an MDSplot on my samples and they are perfectly separated: the first > dimension separates LPS vs. U or P; while the second dimension separates Wt > vs. KO. Moreover, between WT and KO groups also U and P are separated along > that dimension. > > So, now I have some statistical question when fitting a linear model with > limma: > 1)Is it correct to fit the linear model considering all the samples? Even > though for example LPS treated sample are completely different from > untreated? They aren't completely different. They are the same monocytes that have been subjected to a different environmental stressor. To answer your question, yes it is correct. > 2)Should I consider the experiment with proliferating cells (P) a separate > one and fit an independent linear model (considering that I am not > particularly interested in the differences U-P or LPS - P. I would keep everything together. The difference is subtle, but important. The denominator of your contrast is based on an overall estimate of intra-group variability. If you have all three sample types in the model, that increases the number of samples with which you estimate this variability. Since the sample variance is a fairly inefficient statistic (meaning it takes relatively more data before it converges to the parameter it is intended to estimate), increasing the amount of data used to estimate intra-group variability will tend to improve power to detect differences. > 3) Are the statistics influenced by the number of contrasts you investigate > when calling the "contrasts.fit" and "eBayes" functions? Not sure what you mean here. What statistics? Influenced how? I'll take a stab at an answer, however. You might be worried about the increase in multiplicity with increasing numbers of contrasts. This is an issue to a certain extent, but can be ameliorated by adjusting for multiplicity with e.g., FDR, which isn't affected by the number of contrasts. In other words, if you have two contrasts that are FDR adjusted and you select genes with an adjusted p-value of 0.05, the entire set of genes you select are estimated to contain at most 5% false positives. If you increase the number of contrasts, the proportion of false positives doesn't increase. > 3) When calling lmFit should I "block" for fetal liver origin? (considering > that I split each fetal liver in three plates forming P, U and LPS) even > though cells do not cluster for fetal liver origin). It's hard to say. In conventional statistics one tends to fit various models and then chooses the one that explains the data 'the best' while trying to be as parsimonious as possible. In microarray analyses we don't have that luxury, as we are fitting thousands of models at the same time, so each cannot be tailored specifically to each gene. In other words, we fit a single model to all of the genes in a rather naive manner rather than different models based on how the model fits a particular gene. So in general we often fit what are arguably over-parameterized models because we don't want to take the chance that there are certain genes for which we need all the parameters in our model. In other words, in aggregate there doesn't appear to be a real need to block on mouse with your data. This doesn't tell us if there are certain genes that really do vary by source mouse, and would benefit from having the blocking variables in the model. This brings us to the topic of assumptions. You can assume that you don't really need to block on mouse, based on the results of duplicateCorrelations returning a low correlation. That isn't a horrible assumption, and could be defended, especially since you will 'save' some degrees of freedom by not blocking. Or you could think that maybe there are some genes that really need the blocking variable and you are willing to reduce your degrees of freedom. I think that is a reasonable assumption as well. But you wouldn't ever both block and add the correlation. You do one or the other. Blocking requires fewer assumptions than fitting a mixed model (which is what you do if you pass a correlation to lmFit). And if you fit a blocking variable, you have to increase the number of rows of your contrasts matrix to account for that (which is why you get the error below). There are examples in the limma User's Guide that show how to handle a blocking variable. Best, Jim > To that purpose first I calculated the correlation between samples > originating from the same fetal liver (with duplicateCorrelation) and it > was "0.15", afterwards I used lmFit putting the block and correlation > variables into the formula, but when I used "contrasts.fit" I had the > following error: > Error in contrasts.fit(fit2, contrasts2) : > Number of rows of contrast matrix must match number of coefficients > In addition: Warning message: > In contrasts.fit(fit2, contrasts2) : > row names of contrasts don't match col names of coefficients > > Sorry for all these questions, initially I treated my experiment as a 2x3 > factorial, after I conducted two separates analysis one for proliferating > cells and another one as a 2x2 factorial, but now I am not sure on my > statistics. > I hope to have been clear. > Ilario, PhD Student > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 10.9 years ago James W. MacDonald 65k

Login before adding your answer.