Hello,
To simplify the description of my problem, I will use the same variables as in Example 3 of ?results in DESeq2. I have the following samples from 2 genotypes (I and II) treated under 2 conditions (A and B)
sample | genotype | condition |
1 | I | A |
2 | I | A |
3 | I | A |
4 | I | A |
5 | I | A |
6 | I | A |
7 | I | A |
8 | I | A |
9 | I | B |
10 | I | B |
11 | I | B |
12 | I | B |
13 | II | A |
14 | II | A |
15 | II | A |
16 | II | A |
17 | II | B |
18 | II | B |
19 | II | B |
If we define "upregulated genes" as those that have higher expression in condition B compared to condition A, I would like to test the null hypothesis that genotype II does not have more upregulated genes than genotype I.
I wrote my design formula as:
design(dds) <- ~ genotype + condition + genotype:condition
I obtained the interaction term for the condition effect in genotype II vs genotype I:
results(dds, name="genotypeIII.conditionB", altHypothesis="greater")
22% of the genes were more significantly upregulated in genotype II than in genotype I. I am wondering whether this particular analysis is sensitive to having uneven replicate sizes between the two genotypes -- while the biological replicates are all well correlated, I have more samples overall from genotype I than genotype II, particularly the condition A samples (8 genotype I vs 4 genotype II).
I reran this several times by subsampling replicates from genotype I so that the group sizes for genotypes I and II were equal (4 genotype I condition A, 3 genotype I condition B, 4 genotype II condition A, 3 genotype II condition B). When I do this with different subsets of genotype I samples, 72-80% of the genes are more significantly upregulated in genotype II than in genotype I. These additional genes do seem to be bona fide upregulated genes by other independent measures (e.g. levels of their protein products). With which set of upregulated genes should I proceed?
Thank you in advance for your help.
This was helpful. Thank you for your response!