I wish to identify significantly expressed genes over the course of 3 time periods (6months (180), 1year(365), 2years(730)) between my wild type and knockout mouse models which I have in duplicates for each time point. So, I have 2 knockout and 2 wild type mice for each time point. I analyzed my data using DESeq2 according to the Timecourse experiments vignette as follows:
> countdata <- read.table("~/desktop/counts_table_lungs.txt", header = TRUE, row.names = 1)
> library(DESeq2)
> countdata <- as.matrix(countdata)
> head(countdata)
> coldata = data.frame(row.names = colnames(countdata), Age = c(rep("180",2), rep("365",2), rep("730",2), rep("180",2), rep("365",2), rep("730",2)), condition = c("WT", "WT", "WT", "WT", "WT", "WT", "KO", "KO", "KO", "KO", "KO", "KO"))
> coldata
My coldata looks as such:
Age condition
X6MWTLG 180 WT
X6MWTLG2 180 WT
X1YWTLG 365 WT
X1YWTLG2 365 WT
X2YWTLG 730 WT
X2YWTLG2 730 WT
X6MKOLG 180 KO
X6MKOLG2 180 KO
X1YKOLG 365 KO
X1YKOLG.1 365 KO
X2YKOLG 730 KO
X2YKOLG.1 730 KO
And after DESeq(dds),
> dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ Age + condition + Age:condition)
> dds <- DESeq(dds)
> res <- results(dds)
> res$symbol <- mcols(dds)$symbol
> table(res$padj<0.05)
I obtain these results:
FALSE TRUE
14861 98
> res <- res[order(res$padj), ]
> head(res)
log2 fold change (MLE): Age730.conditionWT
Wald test p-value: Age730.conditionWT
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
Fez1 214.4469 6.405264 0.6162335 10.394216 2.634435e-25 3.940851e-21
Tnn 369.3435 2.365689 0.3600403 6.570622 5.010546e-11 3.747638e-07
Acta2 3867.6960 -1.899911 0.3029448 -6.271474 3.576456e-10 1.132337e-06
Actg2 458.2408 -2.127611 0.3397300 -6.262654 3.784801e-10 1.132337e-06
Cd79b 454.0894 2.356866 0.3723858 6.329097 2.465994e-10 1.132337e-06
Cnn1 845.1697 -2.032141 0.3642512 -5.578953 2.419704e-08 5.274170e-05
I wish to know if I have accurately analyzed the data, and why does the result show only a single Age730.conditionWT for log2 fold change (MLE) and Wald test p-value. Thank you.
Thanks for the quick reply Michael,
That is the vignette (10. Time course experiments) I followed in my analysis. Is my analysis incorrect in answering my question? Please let me know. Thanks a lot
You didn't use the same
DESeq()
call though: You're not performing a LRT. You didn't give a reduced design, etc.Hello Michael, I missed pasting a part of the code. This is my error. Here is my complete code below;
Our goal is know how the gene knockout affects the course of aging in these mice over the three time courses and by incremental pairwise comparison using 6month (180) time period as the reference. Your help is greatly appreciated. Thank you.
Everything looks correct in this case. The p-value should not say "Wald test". Regarding the LFC, see this FAQ in the vignette:
https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#i-ran-a-likelihood-ratio-test-but-results-only-gives-me-one-comparison.
Hello Michael,
I just ran the analysis afresh, and you are right here are the results I got:
FALSE TRUE
14327 269
> res <- res[order(res$padj), ]
> head(res)
log2 fold change (MLE): Age730.conditionWT
LRT p-value: '~ Age + condition + Age:condition' vs '~ Age + condition'
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat
<numeric> <numeric> <numeric> <numeric>
Fez1 214.4469 6.4052642 0.6162335 139.43556
Tnn 369.3435 2.3656890 0.3600403 73.75752
Cd79b 454.0894 2.3568662 0.3723858 66.27804
Cd22 287.1829 1.3314468 0.3766654 51.47430
Cd79a 375.6751 2.2090925 0.4813291 46.69129
Cyp1a1 872.0934 0.5355496 0.4555664 45.83639
pvalue padj
<numeric> <numeric>
Fez1 5.271717e-31 7.694598e-27
Tnn 9.632903e-17 7.030093e-13
Cd79b 4.054204e-15 1.972505e-11
Cd22 6.645055e-12 2.424780e-08
Cd79a 7.262992e-11 2.120213e-07
Cyp1a1 1.113665e-10 2.709176e-07
Micheal, I was wondering what would be the best way to perform pairwise comparisons between time points while considering the effect of the gene knockdown. Please let me know. Thank you.
I'm sorry but this is explicitly covered in the section of the workflow I've pointed you to below. I try to reply to questions in an efficient manner, but this really requires that you uphold your end of the bargain, and take the time to read the material that is already available.
Michael, a follow up question, is there a way to identify the specific time span, whether between 6month and 1 year or 1-2 years where the significant gene expression changes occurred to qualify the gene to be selected by the DESeq2 command? I tried using the ggplot command as provided in your vignette to visualize a time course plot but received an error message. Please let me know. Thank you.