Hello,
I have been working on microarrays using R and Limma for differential gene expression analysis. My current design is fairly simple as I am just using two class "control" and "treatment" and I am only interesting in the DE genes between control and treatment.
design <- model.matrix(~cell_class, data)
But the data also contains different cell lines (around 7), and treatment methods (4), so I have been wondering if it would be better and more precise to use another design like so:
design <- model.matrix(~cell_class + cell_lines + treatment_methods, data)
Both designs lead to very similar output when calling topTable, almost all DE genes are the same but with differences in fold change. I have been searching in limma documentation chap 9, 9.5, 9.7 are quite similar to my question but not exactly the same as they seems to be interested in contrast between sub groups while I am more interested in the global control vs treatment.
ps: I am not posting the design matrix output as it is a little over 300 rows and would not be very usefull
If you have a control and 4 treatment methods, doesn't that make five classes, not 2?
Thanks but I wouldn't ask if it was that simple, well I am not interested in the slight difference between treatments, my interest is in the general picture of the cell control vs treated(any treatment method)
Let suppose I didn't mention the treatment_methods and only add the cell_lines information to the second design would it be more accurate in this case compare to the first one even if I am only using the cell_class binary factor ( control/treated) as coef in topTable ?
This post would benefit from a small-scale example. Are all classes/treatments present in each cell line? Are the treatment types nested within the treatment class, or are there specific controls for each treatment method? Currently I am imagining your situation below, but I can't tell if this is actually the case, and it's difficult to give a precise answer without details:
Thank you for taking the time Aaron, you resumed pretty well the situation in your code, I just simplified and I changed it a little to reflect the fact that I have a least 2 control per cell line, some cell lines are more represented than others and a few cell line have more than one treatment method.