Question

DESeq2 comparing interaction terms between multiple genotypes

0

Entering edit mode

szhen • 0

@szhen-12135

Last seen 7.3 years ago

Hi here,

I have a question for comparing the interaction terms between multiple genotypes. I have three genotypes, two sequencing library types and different numbers of replicates for each sample type (see below sample description).

I'd like to compare the translational efficiency(i.e. Ribo/RNA) between each genotype (mutant2 vs wildtype, mutant1 vs wildtype, mutant2 vs mutant1). Here, you can think "type" similar to "treatment" (which is commonly used as an example in DESeq2), essentially I'd like to analyze the interaction term between "Genotype" and "Type".

This is my code:

ddsLoad <- DESeqDataSetFromMatrix(countData = count, colData = samples, design = ~ Genotype*Type)

This is my sample description:

Genotype	Type	Replicate
wildtype	RNA	1
wildtype	RNA	2
wildtype	RNA	3
mutant1	RNA	1
mutant1	RNA	2
mutant2	RNA	1
mutant2	RNA	2
wildtype	Ribo	1
wildtype	Ribo	2
wildtype	Ribo	3
mutant1	Ribo	1
mutant1	Ribo	2
mutant2	Ribo	1
mutant2	Ribo	2

This is my code to obtain the diff in the interaction term between wildtype and mutant1, and then between wildtype and mutant2:

results(dds,name="Genotypemutant1.TypeRibo")

results(dds,name="Genotypemutant2.TypeRibo")

Is this correct?

And how could I compare the interaction term between mutant2 and mutant1?

Lastly, since the replicates are not paired, should I leave as it is or add it to the design matrix like this?

ddsLoad <- DESeqDataSetFromMatrix(countData = count, colData = samples, design = ~ Replicate + Genotype*Type)

Thanks so much for the kind help! Best,

-z

deseq2 • 1.3k views

ADD COMMENT • link updated 7.3 years ago by Gavin Kelly ▴ 680 • written 7.3 years ago by szhen • 0

score 3 · Answer 1 · 2017-01-10

3

Entering edit mode

Gavin Kelly ▴ 680

@gavin-kelly-6944

Last seen 4.0 years ago

United Kingdom / London / Francis Crick…

That looks correct to me; to get the third pairwise interaction, you could do

results(dds, contrast=list("Genotypemutant1.TypeRibo", "Genotypemutant2.TypeRibo"))

where the first element of the list is the numerator (so gets treated positively) and the second is the denominator (so gets subtracted), which leaves the effect you're looking for.

If genotype and type don't come in batches as defined by Replicate, then it would be counter-productive to include a Replicate term in your model; a term should only be included when it corresponds to something either technical or biological - and if the data aren't 'paired', then being 'replicate 1' in a particular experimental group is not the same as being 'replicate 1' in another group(?) - so there's no use in estimating it. (If were a pairing, you'd need to make sure you convert your numeric-looking 'replicate' into a factor first, though).

ADD COMMENT • link 7.3 years ago Gavin Kelly ▴ 680

0

Entering edit mode

Just reading over the thread. Agree with everything Gavin said.

ADD REPLY • link 7.3 years ago Michael Love 41k

0

Entering edit mode

Hi Gavin and Michael,

Thanks so much for the reply! I still have a question in regard to getting the third pairwise interaction. I compared the results from the following two approaches, however they look completely different:

1. As you suggested,

samples$Genotype=factor(samples$Genotype,levels=c("wildtype","mutant1","mutant2"))

samples$Type=factor(samples$Type,levels=c("RNA","Ribo"))

ddsLoad <- DESeqDataSetFromMatrix(countData = count, colData = samples, design = ~ Genotype*Type)

Then:

results(dds, contrast=list("Genotypemutant2.TypeRibo", "Genotypemutant1.TypeRibo"))

When I do:

summary(results)

I got:

out of 14346 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 13, 0.091%
LFC < 0 (down) : 53, 0.37%
outliers [1] : 224, 1.6%
low counts [2] : 1424, 9.9%
(mean count < 0.8)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

2. I treat mutant1 as the basal level, in the design matrix

samples$Genotype=factor(samples$Genotype,levels=c("mutant1","wildtype","mutant2"))

samples$Type=factor(samples$Type,levels=c("RNA","Ribo"))

ddsLoad <- DESeqDataSetFromMatrix(countData = count, colData = samples, design = ~ Genotype*Type)

Then:

result <- results(dds,name="Genotypemutant2.TypeRibo"))

When I do

summary(result)

I got this instead:

out of 14346 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 2, 0.014%
LFC < 0 (down) : 4, 0.028%
outliers [1] : 226, 1.6%
low counts [2] : 9878, 69%
(mean count < 217.9)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Could I ask why? I thought the two approaches should give me the same results. Thanks again!

-Z

ADD REPLY • link 7.3 years ago szhen • 0

1

Entering edit mode

Yes, they should in theory give you the same result, but due to differences in the updates to the coefficient vector and the need for defining convergence, you obtain numerically different statistics. It is made more of a problem with factors with three or more levels when just one of the groups (the group not in your contrast of interest) has all zeros for a number of genes. Then your coefficients of interest tend toward +/- infinity, or are finite, depending on how you level the factor. I guess the ribo and RNA samples have fairly different overall distributions.

Note that this happens with base R's glm as well, where you can get differences in statistics based on releveling:

set.seed(1)
diff <- replicate(10, {
  n <- 3
  y <- c(rep(0,n),rpois(2*n,lambda=5))
  x <- factor(rep(1:3,each=n))
  fit <- glm(y ~ x, family=poisson)
  library(multcomp)
  stat <- summary(glht(fit, matrix(c(0,-1,1),1)))$test$tstat
  x2 <- relevel(x, "2")
  fit2 <- glm(y ~ x2, family=poisson)
  stat2 <- summary(fit2)$coef[3,3]
  stat - stat2})
summary(diff)
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-9.675e-08 -1.764e-08 -7.500e-09 -5.022e-09  8.588e-09  1.147e-07

ADD REPLY • link 7.3 years ago Michael Love 41k

0

Entering edit mode

hi szhen,

Could you email me, michaelisaiahlove (gmail), I'd like to see if some code refactoring will help alleviate the inconsistencies.

ADD REPLY • link 7.3 years ago Michael Love 41k