Dear all,
I have the samples from two mouse strains in both genders.
Strain | Gender |
AJ | M |
AJ | F |
BL6 | M |
BL6 | F |
The design formulas I have are:
formula 1: ~ strain # test the effect of different strains formula 2: ~ strain + gender # test for the effect of gender controlling for the effect of different mouse strains formula 3: ~ strain + gender + stain:gender # test which genes the effect of gender is different across different mouse strains
Here the problem is, I'm not sure if gender has an effect on the gene expression.
So I want to test if it's meaningful by adding gender and the interaction terms in the design formula, as the results vary a lot with different formulas.
I'm not familiar with statistics. Could anyone give me a hint on how to test it?
Thanks a lot!
Best,
Yujuan Gui
Thanks a lot! I have a follow-up question regarding to the replicate number for interaction.
We are also thinking about to increase the replicate number to 12 (6 female + 6 male) for each strain, to account for the individual difference. In that sense, is it the number of replicates enough to take the interaction into consideration?
Best,
2 mice in each group is enough to allow you estimate interaction with some degrees of freedom left to assess statistical significance (well, you could even get away with 2 mice in just one of the groups, but I wouldn't recommend that!) The question now becomes one of power to detect any changes, and within reason the answer is, the more replicates the better! Particularly if you're looking to settle the question of whether gender has an influence on expression: a poorly powered experiment will leave you still unable to answer this question, as a null result is interpreted as there being not enough evidence of change, rather than there being evidence of no change.
6 in each group seems a reasonable number, though. You may want to do a PCA or clustering to gain some intuition as to much of an effect gender is having globally. You may then want to try formula '2' to remove gender effects: looking at the size of the genelist given by results(dds, contrast=c("gender", "M", "F")) might give you enough confidence that if it's small, you can fall back to the even simpler formula '2'. Alternatively, the clustering (or your actual hypothesis) might mean you want separate genelists for male-specific strain-differential genes and female specific, in which case you'd want to use formula 3.
Assuming you want to find strain-dependent genes, I'd recommend formula 2, as you're only losing a bit of power compared to '1', but with the added benefit off finding genes that have a consistent effect (but different baseline) across the genders. But it's worth checking formula 3 which can give you gender-specific strain-dependent genes. Depends on what your hypothesis is.