Hi here,
I have a question for comparing the interaction terms between multiple genotypes. I have three genotypes, two sequencing library types and different numbers of replicates for each sample type (see below sample description).
I'd like to compare the translational efficiency(i.e. Ribo/RNA) between each genotype (mutant2 vs wildtype, mutant1 vs wildtype, mutant2 vs mutant1). Here, you can think "type" similar to "treatment" (which is commonly used as an example in DESeq2), essentially I'd like to analyze the interaction term between "Genotype" and "Type".
This is my code:
ddsLoad <- DESeqDataSetFromMatrix(countData = count, colData = samples, design = ~ Genotype*Type)
This is my sample description:
Genotype | Type | Replicate |
wildtype | RNA | 1 |
wildtype | RNA | 2 |
wildtype | RNA | 3 |
mutant1 | RNA | 1 |
mutant1 | RNA | 2 |
mutant2 | RNA | 1 |
mutant2 | RNA | 2 |
wildtype | Ribo | 1 |
wildtype | Ribo | 2 |
wildtype | Ribo | 3 |
mutant1 | Ribo | 1 |
mutant1 | Ribo | 2 |
mutant2 | Ribo | 1 |
mutant2 | Ribo | 2 |
This is my code to obtain the diff in the interaction term between wildtype and mutant1, and then between wildtype and mutant2:
results(dds,name="Genotypemutant1.TypeRibo")
results(dds,name="Genotypemutant2.TypeRibo")
Is this correct?
And how could I compare the interaction term between mutant2 and mutant1?
Lastly, since the replicates are not paired, should I leave as it is or add it to the design matrix like this?
ddsLoad <- DESeqDataSetFromMatrix(countData = count, colData = samples, design = ~ Replicate + Genotype*Type)
Thanks so much for the kind help! Best,
-z
Just reading over the thread. Agree with everything Gavin said.
Hi Gavin and Michael,
Thanks so much for the reply! I still have a question in regard to getting the third pairwise interaction. I compared the results from the following two approaches, however they look completely different:
1. As you suggested,
Then:
When I do:
I got:
out of 14346 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 13, 0.091%
LFC < 0 (down) : 53, 0.37%
outliers [1] : 224, 1.6%
low counts [2] : 1424, 9.9%
(mean count < 0.8)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
2. I treat mutant1 as the basal level, in the design matrix
Then:
When I do
I got this instead:
out of 14346 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 2, 0.014%
LFC < 0 (down) : 4, 0.028%
outliers [1] : 226, 1.6%
low counts [2] : 9878, 69%
(mean count < 217.9)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
Could I ask why? I thought the two approaches should give me the same results. Thanks again!
-Z
Yes, they should in theory give you the same result, but due to differences in the updates to the coefficient vector and the need for defining convergence, you obtain numerically different statistics. It is made more of a problem with factors with three or more levels when just one of the groups (the group not in your contrast of interest) has all zeros for a number of genes. Then your coefficients of interest tend toward +/- infinity, or are finite, depending on how you level the factor. I guess the ribo and RNA samples have fairly different overall distributions.
Note that this happens with base R's glm as well, where you can get differences in statistics based on releveling:
hi szhen,
Could you email me, michaelisaiahlove (gmail), I'd like to see if some code refactoring will help alleviate the inconsistencies.