Hi,
I am using DESeq2 to analyze differential expressed genes (DEGs) from counts generated by Illumina sequencing.
I have a 3-factor experiment:
- genotype (with 3 levels): A, B, and C, each with 6 biological replicates (3 for each treatment)
- treatment (with 2 level): mDr (the treatment) and WW (the control)
- timepoint (with 3 levels): T1, T2 and T3.
In order to analyze genes that are influenced by an individual component and those influenced by two or all variables, I'm interested in the following results/questions:
- How many DEGs are influenced by an individual component (genotype, treatment, timepoint)?
- How many DEGs are influenced by the combination of two or all components (genotype:treatment + genotype:timepoint + treatment:timepoint + genotype:treatment:timepoint)?.
So, I tried as follows:
library(DESeq2) # Load row counts countData<-read.delim("all_counts_matrix_T1_T2_T3", sep="\t", header=TRUE, row.names = 1) col_ordering = c(1,3,5,2,4,6,7,9,11,8,10,12,13,15,17,14,16,18, 19,21,23,20,22,24,25,27,29,26,28,30,31,33,35,32,34,36,37,39,41,38,40,42,43,45,47,44,46,48,49,51,53,50,52,54) countData_ord = countData[,col_ordering] # Create countData as matrix countData_m <- as.matrix(countData_ord) # Interger numbers in matrix storage.mode(countData_m) = "integer" #Create sample information colData<-read.delim("colData.txt", sep="\t", header=TRUE, row.names = 1) head(countData_m) colData rownames(colData) == colnames(countData_m) # Count matrix and sample information input and design formula dds<-DESeqDataSetFromMatrix(countData = countData_m, colData = colData, design = ~ genotype + treatment + timepoint + genotype:treatment + genotype:timepoint + treatment:timepoint + genotype:treatment:timepoint) dds$treatment<- relevel(dds$treatment, "WW") dds$treatment dds$genotype dds$timepoint ## Filtering to remove rows with 0 reads or only a single count across all samples dds <- dds[ rowSums(counts(dds)) > 1, ] nrow(dds) dds # Run DESeq2 dds = DESeq(dds) # Calling results res <- results(dds) mcols(res, use.names=TRUE) # Print the coefficients resultsNames(dds) summary(res) write.table(res, file='genotype_treatment_timepoint/res_04082016.txt', col.names=T, row.names=T, quote=F, sep="\t")
In resultsNames(dds) I get:
[1] "Intercept" "genotype_B_vs_A" "genotype_C_vs_A" "treatment_mDr_vs_WW"
[5] "timepoint_T2_vs_T1" "timepoint_T3_vs_T1" "genotypeB.treatmentmDr" "genotypeC.treatmentmDr"
[9] "genotypeB.timepointT2" "genotypeC.timepointT2" "genotypeB.timepointT3" "genotypeC.timepointT3"
[13] "treatmentmDr.timepointT2" "treatmentmDr.timepointT3" "genotypeB.treatmentmDr.timepointT2" "genotypeC.treatmentmDr.timepointT2"
[17] "genotypeB.treatmentmDr.timepointT3" "genotypeC.treatmentmDr.timepointT3"
First of all, this is correct?
And then, how can I extract the specific results I wanted (question 1 and 2)? I've tried to read up on this, but haven't been able to find how to code this.
Any guidance is much appreciated.
Thanks in advance
Chiara
Hi gavinpaulkelly and Micheal,
thanks very much for your replies.
if I understand correctly, if I want to test the genotype effect:
And to extract the results of log2foldchange of pairs of genotypes, I can use the contrast argument of results:
And the same for the treatment and timepoint.
Is it correct?
Thanks again,
Chiara
Yes that is one design that you can use to test the genotype effect, assuming no interactions between genotype, treatment and time. If you need to employ more complex designs, I'd recommend collaborating with a statistician.
You should add test="Wald" if you want to test those contrasts, the way you have it you are only pulling out fold changes but not calculating p-values. See ?results.
Thanks Michael for your answer.
I will take into account your suggestions.
Thanks again,
Chiara