Question: DESeq2 3-fatcor design and different interaction terms
0
gravatar for c.evang
2.8 years ago by
c.evang0
c.evang0 wrote:

Hi,

I am using DESeq2 to analyze differential expressed genes (DEGs) from counts generated by Illumina sequencing.

I have a 3-factor experiment:

  • genotype (with 3 levels): A, B, and C, each with 6 biological replicates (3 for each treatment)
  • treatment (with 2 level): mDr (the treatment) and WW (the control)
  • timepoint (with 3 levels): T1, T2 and T3.

In order to analyze genes that are influenced by an individual component and those influenced by two or all variables, I'm interested in the following results/questions:

  1. How many DEGs are influenced by an individual component (genotype, treatment, timepoint)?
  2. How many DEGs are influenced by the combination of two or all components (genotype:treatment + genotype:timepoint + treatment:timepoint + genotype:treatment:timepoint)?.

So, I tried as follows:

library(DESeq2)

# Load row counts
countData<-read.delim("all_counts_matrix_T1_T2_T3", sep="\t", header=TRUE, row.names = 1)
col_ordering = c(1,3,5,2,4,6,7,9,11,8,10,12,13,15,17,14,16,18, 19,21,23,20,22,24,25,27,29,26,28,30,31,33,35,32,34,36,37,39,41,38,40,42,43,45,47,44,46,48,49,51,53,50,52,54)
countData_ord = countData[,col_ordering]

# Create countData as matrix
countData_m <- as.matrix(countData_ord)
# Interger numbers in matrix
storage.mode(countData_m) = "integer"

#Create sample information
colData<-read.delim("colData.txt", sep="\t", header=TRUE, row.names = 1)

head(countData_m)
colData

rownames(colData) == colnames(countData_m)

# Count matrix and sample information input and design formula
dds<-DESeqDataSetFromMatrix(countData = countData_m, colData = colData, design = ~ genotype + treatment + timepoint + genotype:treatment + genotype:timepoint + treatment:timepoint + genotype:treatment:timepoint)

dds$treatment<- relevel(dds$treatment, "WW")

dds$treatment
dds$genotype
dds$timepoint

## Filtering to remove rows with 0 reads or only a single count across all samples
dds <- dds[ rowSums(counts(dds)) > 1, ]
nrow(dds)
dds

# Run DESeq2
dds = DESeq(dds)
# Calling results
res <- results(dds)
mcols(res, use.names=TRUE)

# Print the coefficients
resultsNames(dds)

summary(res)
write.table(res, file='genotype_treatment_timepoint/res_04082016.txt', col.names=T, row.names=T, quote=F, sep="\t")

In resultsNames(dds) I get:
[1] "Intercept"                          "genotype_B_vs_A"                    "genotype_C_vs_A"                    "treatment_mDr_vs_WW"             
[5] "timepoint_T2_vs_T1"                 "timepoint_T3_vs_T1"                 "genotypeB.treatmentmDr"             "genotypeC.treatmentmDr"          
[9] "genotypeB.timepointT2"              "genotypeC.timepointT2"              "genotypeB.timepointT3"              "genotypeC.timepointT3"           
[13] "treatmentmDr.timepointT2"           "treatmentmDr.timepointT3"           "genotypeB.treatmentmDr.timepointT2" "genotypeC.treatmentmDr.timepointT2"
[17] "genotypeB.treatmentmDr.timepointT3" "genotypeC.treatmentmDr.timepointT3"

 

First of all, this is correct?

And then, how can I extract the specific results I wanted (question 1 and 2)? I've tried to read up on this, but haven't been able to find  how to code this.

Any guidance is much appreciated. 

Thanks in advance

Chiara

ADD COMMENTlink modified 2.8 years ago by Michael Love23k • written 2.8 years ago by c.evang0
Answer: DESeq2 3-fatcor design and different interaction terms
1
gravatar for Gavin Kelly
2.8 years ago by
Gavin Kelly560
United Kingdom / London / Francis Crick Institute
Gavin Kelly560 wrote:

If you really want to test inclusion of specific factors, rather than looking at the contrast between levels within a factor, then I'd suggest using a likelihood ratio test, which DESeq2 conveniently provides:

results(dds, type="LRT", full= ~genotype + treatment + timepoint, reduced= ~genotype + treatment)

will inform you which genes need to take 'treatment' into account in order to explain the variability between samples, over and above the variability induced by genotype and treatment.  You may want to consider the full=~treatment, reduced=~1 comparison to see if the null model you choose alters the output.  You can do this for the other factors (include and exclude it from the full and reduced models) to examine their influence, and similarly you can add second (and third) order interactions:

results(dds, type="LRT", full= ~genotype + treatment + timepoint + genotype:treatment, reduced= ~genotype + treatment + timepoint)

to look at question 2 (in this case looking at genotype:treatment interactions, but the others are similar).  For the timepoint comparison, it might also be worth adding a 'biological replicate' factor, so you could see if individual replicates are behaving consistently. 

 
ADD COMMENTlink written 2.8 years ago by Gavin Kelly560
Answer: DESeq2 3-fatcor design and different interaction terms
1
gravatar for Michael Love
2.8 years ago by
Michael Love23k
United States
Michael Love23k wrote:

This is a complex experimental design and there are many ways to pose hypotheses here and to test them with DESeq2. I'd recommend  you partner with a local statistician to help pose specific hypotheses as contrasts or likelihood ratio tests and to help interpret results. There is nothing specific about DESeq2 for the comparisons you'd like to make, you would use the same design and contrasts as with a normal linear model.

ADD COMMENTlink written 2.8 years ago by Michael Love23k

Hi gavinpaulkelly and Micheal,

thanks very much for your replies.

if I understand correctly, if I want to test the genotype effect:

results(dds, type="LRT", full= ~genotype + treatment + timepoint, reduced= ~ treatment + timepoint)

And to extract the results of log2foldchange of pairs of genotypes, I can use the contrast argument of results:

results(dds, contrast=c("genotype","B","A"), alpha=.05)

results(dds, contrast=c("genotype","B","C"), alpha=.05)

results(dds, contrast=c("genotype","C","A"), alpha=.05)

And the same for the treatment and timepoint.

Is it correct?

Thanks again,

Chiara

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by c.evang0

Yes that is one design that you can use to test the genotype effect, assuming no interactions between genotype, treatment and time. If you need to employ more complex designs, I'd recommend collaborating with a statistician.

You should add test="Wald" if you want to test those contrasts, the way you have it you are only pulling out fold changes but not calculating p-values. See ?results.

ADD REPLYlink written 2.7 years ago by Michael Love23k

Thanks Michael for your answer.

I will take into account your suggestions.

Thanks again,

Chiara

ADD REPLYlink written 2.7 years ago by c.evang0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 327 users visited in the last hour