I want to do an analysis of some mice data with DESeq2. We have a total 24 samples and three factors. Factor one is the genotyp (6 wildtype, 6x knockout 1, 6x knockout2, 6x knockout3) and factor two are two different cell lines (cellA and cellB). Factor three is the sex (male and female
Here is the PCA
We want to compare the subgroups to each other, for example wt-cell1 vs wt-cell2, ko1-cell2 vs ko1-cell2 but also wt-cell1 vs ko1-cell1. Therefore I we wanted to group both factors into one factor and don't use interactions.
dds$group <- factor(paste0(dds$Genotyp, dds$Cell))
Then we realised that 6 of the samples didn't cluster very well. Further investigations have shown that these are female mice which is why the want to chose the following design
~ Sex + group
I this the correct way? We will still be able to do all the comparison and would account for the sex. Or would it be better to remove the samples because it affects 6 out of 8 subgroups (ko1 of cellA and cellB don't have female mice). The inner variance of these two groups which don't have female mouse should be different compared to the others. If we leave the female mice in could this lead to a different set of DEGs compared to removing the 6 samples?
Thanks for your feedback