DESeq2 comparing interaction terms between multiple genotypes
1
0
Entering edit mode
szhen • 0
@szhen-12135
Last seen 7.3 years ago

Hi here,

I have a question for comparing the interaction terms between multiple genotypes. I have three genotypes, two sequencing library types and different numbers of replicates for each sample type (see below sample description).

I'd like to compare the translational efficiency(i.e. Ribo/RNA) between each genotype (mutant2 vs wildtype, mutant1 vs wildtype, mutant2 vs mutant1). Here, you can think "type" similar to "treatment" (which is commonly used as an example in DESeq2), essentially I'd like to analyze the interaction term between "Genotype" and "Type".

 

This is my code:

ddsLoad <- DESeqDataSetFromMatrix(countData = count, colData = samples, design = ~ Genotype*Type)

This is my sample description:

Genotype Type Replicate
wildtype RNA 1
wildtype RNA 2
wildtype RNA 3
mutant1 RNA 1
mutant1 RNA 2
mutant2 RNA 1
mutant2 RNA 2
wildtype Ribo 1
wildtype Ribo 2
wildtype Ribo 3
mutant1 Ribo 1
mutant1 Ribo 2
mutant2 Ribo 1
mutant2 Ribo 2

 

This is my code to obtain the diff in the interaction term between wildtype and mutant1, and then between wildtype and mutant2:

results(dds,name="Genotypemutant1.TypeRibo")

results(dds,name="Genotypemutant2.TypeRibo")

 

Is this correct?

And how could I compare the interaction term between mutant2 and mutant1? 

 

Lastly, since the replicates are not paired, should I leave as it is or add it to the design matrix like this?

ddsLoad <- DESeqDataSetFromMatrix(countData = count, colData = samples, design = ~ Replicate + Genotype*Type)

 

Thanks so much for the kind help! Best,

-z

deseq2 • 1.3k views
ADD COMMENT
3
Entering edit mode
Gavin Kelly ▴ 680
@gavin-kelly-6944
Last seen 4.0 years ago
United Kingdom / London / Francis Crick…

That looks correct to me; to get the third pairwise interaction, you could do

results(dds, contrast=list("Genotypemutant1.TypeRibo", "Genotypemutant2.TypeRibo"))

where the first element of the list is the numerator (so gets treated positively) and the second is the denominator (so gets subtracted), which leaves the effect you're looking for.

If genotype and type don't come in batches as defined by Replicate, then it would be counter-productive to include a Replicate term in your model; a term should only be included when it corresponds to something either technical or biological - and if the data aren't 'paired', then being 'replicate 1' in a particular experimental group is not the same as being 'replicate 1' in another group(?) - so there's no use in estimating it.  (If were a pairing, you'd need to make sure you convert your numeric-looking 'replicate' into a factor first, though).

 

ADD COMMENT
0
Entering edit mode

Just reading over the thread. Agree with everything Gavin said.

ADD REPLY
0
Entering edit mode

Hi Gavin and Michael,

Thanks so much for the reply! I still have a question in regard to getting the third pairwise interaction. I compared the results from the following two approaches, however they look completely different:

1. As you suggested, 

samples$Genotype=factor(samples$Genotype,levels=c("wildtype","mutant1","mutant2"))
samples$Type=factor(samples$Type,levels=c("RNA","Ribo"))

ddsLoad <- DESeqDataSetFromMatrix(countData = count, colData = samples, design = ~ Genotype*Type) 

Then:

results(dds, contrast=list("Genotypemutant2.TypeRibo", "Genotypemutant1.TypeRibo"))

 

When I do:

summary(results)

I got:

out of 14346 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 13, 0.091% 
LFC < 0 (down)   : 53, 0.37% 
outliers [1]     : 224, 1.6% 
low counts [2]   : 1424, 9.9% 
(mean count < 0.8)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

 

2. I treat mutant1 as the basal level, in the design matrix

samples$Genotype=factor(samples$Genotype,levels=c("mutant1","wildtype","mutant2"))

samples$Type=factor(samples$Type,levels=c("RNA","Ribo"))

ddsLoad <- DESeqDataSetFromMatrix(countData = count, colData = samples, design = ~ Genotype*Type) 

Then:

result <- results(dds,name="Genotypemutant2.TypeRibo"))

 

When I do

summary(result)

I got this instead:

out of 14346 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 2, 0.014% 
LFC < 0 (down)   : 4, 0.028% 
outliers [1]     : 226, 1.6% 
low counts [2]   : 9878, 69% 
(mean count < 217.9)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

 

Could I ask why? I thought the two approaches should give me the same results. Thanks again!

-Z

 

 

ADD REPLY
1
Entering edit mode

Yes, they should in theory give you the same result, but due to differences in the updates to the coefficient vector and the need for defining convergence, you obtain numerically different statistics. It is made more of a problem with factors with three or more levels when just one of the groups (the group not in your contrast of interest) has all zeros for a number of genes. Then your coefficients of interest tend toward +/- infinity, or are finite, depending on how you level the factor. I guess the ribo and RNA samples have fairly different overall distributions.

Note that this happens with base R's glm as well, where you can get differences in statistics based on releveling:

set.seed(1)
diff <- replicate(10, {
  n <- 3
  y <- c(rep(0,n),rpois(2*n,lambda=5))
  x <- factor(rep(1:3,each=n))
  fit <- glm(y ~ x, family=poisson)
  library(multcomp)
  stat <- summary(glht(fit, matrix(c(0,-1,1),1)))$test$tstat
  x2 <- relevel(x, "2")
  fit2 <- glm(y ~ x2, family=poisson)
  stat2 <- summary(fit2)$coef[3,3]
  stat - stat2})
summary(diff)
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-9.675e-08 -1.764e-08 -7.500e-09 -5.022e-09  8.588e-09  1.147e-07 

 

ADD REPLY
0
Entering edit mode

hi szhen,

Could you email me, michaelisaiahlove (gmail), I'd like to see if some code refactoring will help alleviate the inconsistencies.

ADD REPLY

Login before adding your answer.

Traffic: 775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6