Question: Timecourse gene expression analysis with DESeq2
0
gravatar for juditkovacsbio
6 months ago by
juditkovacsbio0 wrote:
  

Dear all,

I'm doing timecourse RNAseq analysis. I'm quite new with R and I would have question concerning DESeqDataSetFromHTSeqCount(...,design=...) and later DESeq(... test= "LRT", reduced = ...) and using results() with Wald test.

I have treated (APO) /untreated (DMSO) sample and 6 timepoint (incl 0) 4 replicate.

When I run my script  I've got the same result for every timepoint when I want to compare APO vs DMSO. My aim is to see which are those genes which are regulated by APO from 0 to 24 hrs.

Theoretically, it means that I have exactly the same set of genes if I ask DEGs after 30 min of treatment or 3 hrs of treatment which does not make any sense. I think I'm missing something but I followed the tutorial and some previous discussion and I made everything the same. If I use wald test as an argument after results() than I've got different sets of genes but it does not convince me. I would be happy with any feedback.

My sampleTable is:

> sampleTable
    
 sampleName         fileName condition replicate timeP
1     APO_0_A   APO_0_A.counts       APO         A     0
2     APO_0_B   APO_0_B.counts       APO         B     0
3     APO_0_C   APO_0_C.counts       APO         C     0
4     APO_0_D   APO_0_D.counts       APO         D     0
5    APO_05_A  APO_05_A.counts       APO         A    05
6    APO_05_B  APO_05_B.counts       APO         B    05
7    APO_05_C  APO_05_C.counts       APO         C    05
8    APO_05_D  APO_05_D.counts       APO         D    05
9     APO_1_A   APO_1_A.counts       APO         A     1
10    APO_1_B   APO_1_B.counts       APO         B     1
11    APO_1_C   APO_1_C.counts       APO         C     1
12    APO_1_D   APO_1_D.counts       APO         D     1
...

and 

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                       directory = directory,
                                       design = ~condition+timeP+condition:timeP)

 

> ddsHTSeq
class: DESeqDataSet 
dim: 37336 48 
metadata(1): version
assays(1): counts
rownames(37336): AT1G01010 AT1G01020 ... ATMG09960 ATMG09980
rowData names(0):
colnames(48): APO_0_A APO_0_B ... DMSO_6_C DMSO_6_D
colData names(3): condition replicate timeP

I filter for genes which has low count (for us it's 10) and than using relevel function to set the reference

ddsK$condition <- relevel(ddsK$condition, ref = "DMSO")

than I make the DESeq with LRT test and reduced timeP+condition

dds <- DESeq(ddsK, test="LRT", reduced = ~timeP+condition)
resultsNames(dds)
 [1] "Intercept"             "condition_APO_vs_DMSO" "timeP_05_vs_0"         "timeP_1_vs_0"          "timeP_24_vs_0"        
 [6] "timeP_3_vs_0"          "timeP_6_vs_0"          "conditionAPO.timeP05"  "conditionAPO.timeP1"   "conditionAPO.timeP24" 
[11] "conditionAPO.timeP3"   "conditionAPO.timeP6" 
res <- results(dds, contrast = list(c("timeP_05_vs_0", "conditionAPO.timeP05")), test = "Wald")
res
log2 fold change (MLE): timeP_05_vs_0+conditionAPO.timeP05 effect 
Wald test p-value: timeP_05_vs_0+conditionAPO.timeP05 effect 
DataFrame with 24455 rows and 6 columns
                   baseMean      log2FoldChange              lfcSE               stat            pvalue              padj
                  <numeric>           <numeric>          <numeric>          <numeric>         <numeric>         <numeric>
AT1G01010  383.915799076324   0.089056481257827  0.193991260802991  0.459074707227502  0.64618051895494 0.873859813008998

...

resSig <- subset(res, padj < 0.1)
resSig
log2 fold change (MLE): timeP_05_vs_0+conditionAPO.timeP05 effect 
Wald test p-value: timeP_05_vs_0+conditionAPO.timeP05 effect 
DataFrame with 2858 rows and 6 columns
                  baseMean     log2FoldChange              lfcSE              stat               pvalue                 padj
                 <numeric>          <numeric>          <numeric>         <numeric>            <numeric>            <numeric>
AT1G01060 48.6720330765987 -0.688897898327615   0.26567978724056 -2.59296315117812  0.00951529650094275   0.0799481384205183

 

Let me know if you have any suggestion or comment on it.

 

Best,

Judit

ADD COMMENTlink modified 6 months ago by Michael Love24k • written 6 months ago by juditkovacsbio0
Answer: Timecourse gene expression analysis with DESeq2
1
gravatar for Michael Love
6 months ago by
Michael Love24k
United States
Michael Love24k wrote:

This question is both a FAQ in the vignette and discussed in the LRT section of the help page for ?results. An LRT tests all the changes across time while the Wald test is for a specific time point.

ADD COMMENTlink written 6 months ago by Michael Love24k

Yes, I tried to follow some of those (e.g. I followed this link: http://bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#time-course-experiments and this: https://support.bioconductor.org/p/65676/#66860)

What I didn't get that if I use

res <- results(dds, contrast = list(c("timeP_05_vs_0", "conditionAPO.timeP05")))

or

res <- results(dds, contrast = list(c("timeP_1_vs_0", "conditionAPO.timeP1")))

and filter for padj < 0.1 than I've got always the same list. Why does it happen? I thought that they should be different.

Thank you!

ADD REPLYlink written 6 months ago by juditkovacsbio0

This is what I was talking about. Did you read ?results yet? Or the FAQ? Please read those first, because it explains what is going on, exactly answering your question.

If you want to make new p-values you need to switch from test="LRT" to test="Wald".

ADD REPLYlink written 6 months ago by Michael Love24k

Dear Michael,

thank you for your help.

I went through on your suggested pages and check other comments from Bioconductor.

I re-arranged my script:

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                       directory = directory,
                                       design = ~condition+timeP+condition:timeP)

and I didn't use any reduced formula with LRT for DESeq():

dds <- DESeq(ddsK)
resultsNames(dds)
 [1] "Intercept"             "condition_APO_vs_DMSO" "timeP_05_vs_0"         "timeP_1_vs_0"          "timeP_24_vs_0"         "timeP_3_vs_0"          "timeP_6_vs_0"         
 [8] "conditionAPO.timeP05"  "conditionAPO.timeP1"   "conditionAPO.timeP24"  "conditionAPO.timeP3"   "conditionAPO.timeP6"  

and made a comparison:

res05 <- results(dds, contrast = list(c("timeP_05_vs_0" ,"conditionAPO.timeP05" )))

resSig05 <- subset(res05, padj < 0.1)

My question would be that is it right that resSig05 gives me those genes which are DE after 30 min because of the treatment and not because of the time compare to control condition (which is time_0)?

If it's correct than is it correct if I do all the comparison for the other time points, separately and at the end I can merge them for further analysis.

Thanks a lot for your comments!

ADD REPLYlink written 6 months ago by juditkovacsbio0

That gives you the genes that are differentially expressed for condition APO at time 5. It's the main effect for time 5 vs 0 and the interaction for condition APO added together, for this design, that gives you the time difference specifically for that condition group.

The analysis is up to you. If you're having trouble with the LRT and interaction design, and the vignette sections are not sufficient to give some sense of what the models are doing, I'd recommend to consult with a statistician. It's pretty important to have a good sense what you are doing.

ADD REPLYlink written 6 months ago by Michael Love24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 197 users visited in the last hour