Time course experiments
1
0
Entering edit mode
uu2110 • 0
@uu2110-14280
Last seen 6.5 years ago

I wish to identify significantly expressed genes over the course of 3 time periods (6months (180), 1year(365), 2years(730)) between my wild type and knockout mouse models which I have in duplicates for each time point. So, I have 2 knockout  and 2 wild type mice for each time point. I analyzed my data using DESeq2 according to the Timecourse experiments vignette as follows:

> countdata <- read.table("~/desktop/counts_table_lungs.txt", header = TRUE, row.names = 1)
> library(DESeq2)

> countdata <- as.matrix(countdata)
> head(countdata)

> coldata = data.frame(row.names = colnames(countdata), Age = c(rep("180",2), rep("365",2), rep("730",2), rep("180",2), rep("365",2), rep("730",2)), condition = c("WT", "WT", "WT", "WT", "WT", "WT", "KO", "KO", "KO", "KO", "KO", "KO"))
> coldata

My coldata looks as such:        

Age condition
X6MWTLG   180        WT
X6MWTLG2  180        WT
X1YWTLG   365        WT
X1YWTLG2  365        WT
X2YWTLG   730        WT
X2YWTLG2  730        WT
X6MKOLG   180        KO
X6MKOLG2  180        KO
X1YKOLG   365        KO
X1YKOLG.1 365        KO
X2YKOLG   730        KO
X2YKOLG.1 730        KO

And after DESeq(dds),

> dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ Age + condition + Age:condition)

> dds <- DESeq(dds)

> res <- results(dds)
> res$symbol <- mcols(dds)$symbol
> table(res$padj<0.05)

I obtain these results:

FALSE  TRUE 
14861    98 
> res <- res[order(res$padj), ]
> head(res)
log2 fold change (MLE): Age730.conditionWT 
Wald test p-value: Age730.conditionWT 
DataFrame with 6 rows and 6 columns
       baseMean log2FoldChange     lfcSE      stat       pvalue         padj
      <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
Fez1   214.4469       6.405264 0.6162335 10.394216 2.634435e-25 3.940851e-21
Tnn    369.3435       2.365689 0.3600403  6.570622 5.010546e-11 3.747638e-07
Acta2 3867.6960      -1.899911 0.3029448 -6.271474 3.576456e-10 1.132337e-06
Actg2  458.2408      -2.127611 0.3397300 -6.262654 3.784801e-10 1.132337e-06
Cd79b  454.0894       2.356866 0.3723858  6.329097 2.465994e-10 1.132337e-06
Cnn1   845.1697      -2.032141 0.3642512 -5.578953 2.419704e-08 5.274170e-05

I wish to know if I have accurately analyzed the data, and why does the result show only a single Age730.conditionWT for log2 fold change (MLE) and Wald test p-value. Thank you.

 

 

deseq2 timecourse • 1.6k views
ADD COMMENT
0
Entering edit mode

Thanks for the quick reply Michael,

That is the vignette (10. Time course experiments) I followed in my analysis. Is my analysis incorrect in answering my question? Please let me know. Thanks a lot

ADD REPLY
0
Entering edit mode

You didn't use the same DESeq() call though: You're not performing a LRT. You didn't give a reduced design, etc.

ADD REPLY
0
Entering edit mode

 

Hello Michael, I missed pasting a part of the code. This is my error. Here is my complete code below;

countdata <- read.table("~/desktop/counts_table_lungs.txt", header = TRUE, row.names = 1)
library(DESeq2)
countdata <- as.matrix(countdata)
head(countdata)
coldata = data.frame(row.names = colnames(countdata), Age = c(rep("180",2), rep("365",2), rep("730",2), rep("180",2), rep("365",2), rep("730",2)), condition = c("WT", "WT", "WT", "WT", "WT", "WT", "KO", "KO", "KO", "KO", "KO", "KO"))
coldata
dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ Age + condition + Age:condition)
dds <- DESeq(dds, test = "LRT", reduced = ~ Age + condition)
res <- results(dds)
res$symbol <- mcols(dds)$symbol
table(res$padj<0.05)
res <- res[order(res$padj), ]
head(res)
write.csv(res, file = "Lungs_All.csv")

Our goal is know how the gene knockout affects the course of aging in these mice over the three time courses and by incremental pairwise comparison using 6month (180) time period as the reference. Your help is greatly appreciated. Thank you.

 

 

ADD REPLY
0
Entering edit mode

Everything looks correct in this case. The p-value should not say "Wald test". Regarding the LFC, see this FAQ in the vignette:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#i-ran-a-likelihood-ratio-test-but-results-only-gives-me-one-comparison.

ADD REPLY
0
Entering edit mode

Hello Michael,

I just ran the analysis afresh, and you are right here are the results I got:

 

FALSE  TRUE 
14327   269 
> res <- res[order(res$padj), ]
> head(res)
log2 fold change (MLE): Age730.conditionWT 
LRT p-value: '~ Age + condition + Age:condition' vs '~ Age + condition' 
DataFrame with 6 rows and 6 columns
        baseMean log2FoldChange     lfcSE      stat
       <numeric>      <numeric> <numeric> <numeric>
Fez1    214.4469      6.4052642 0.6162335 139.43556
Tnn     369.3435      2.3656890 0.3600403  73.75752
Cd79b   454.0894      2.3568662 0.3723858  66.27804
Cd22    287.1829      1.3314468 0.3766654  51.47430
Cd79a   375.6751      2.2090925 0.4813291  46.69129
Cyp1a1  872.0934      0.5355496 0.4555664  45.83639
             pvalue         padj
          <numeric>    <numeric>
Fez1   5.271717e-31 7.694598e-27
Tnn    9.632903e-17 7.030093e-13
Cd79b  4.054204e-15 1.972505e-11
Cd22   6.645055e-12 2.424780e-08
Cd79a  7.262992e-11 2.120213e-07
Cyp1a1 1.113665e-10 2.709176e-07

 

Micheal, I was wondering what would be the best way to perform pairwise comparisons between time points while considering the effect of the gene knockdown. Please let me know. Thank you.

 

ADD REPLY
0
Entering edit mode

I'm sorry but this is explicitly covered in the section of the workflow I've pointed you to below. I try to reply to questions in an efficient manner, but this really requires that you uphold your end of the bargain, and take the time to read the material that is already available.

ADD REPLY
0
Entering edit mode

Michael, a follow up question, is there a way to identify the specific time span, whether between 6month and 1 year or 1-2 years where the significant gene expression changes occurred to qualify the gene to be selected by the DESeq2 command? I tried using the ggplot command as provided in your vignette to visualize a time course plot but received an error message. Please let me know. Thank you. 

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 19 hours ago
United States

You should follow the time series example here:

http://www.bioconductor.org/help/workflows/rnaseqGene/#time

Here we use an LRT to test multiple coefficients.

ADD COMMENT

Login before adding your answer.

Traffic: 1084 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6