My experimental design is the following:
I am interested in obtained D.E. genes for the question HE vs DI given a tissue type. I chose
to merge the tissue_type and genotype columns for the design as recommended by the DESeq2 authors.
However, the number of samples is un-balanced in my experimental design. That means that, for instance,
tissue type A may have twice as more replicates than tissue type B. This would imply that the question
HE vs DI (for tissue type A) would yield more D.E. genes at a given threshold than for tissue type B.
However, I want to know if a given gene is D.E. in HE vs DI for tissue type A and not for tissue type B and in both too, etc...
So, I am wondering:
- Should I balance the sample sizes by selecting randomly replicates from the tissue type that contains more?
- Should I introduce an interaction term so my formula would become: ~ tissue_type + genotype + tissue_type:genotype
Thanks for the help!