Hi,
I am trying to analyze RNA-Seq Data with DESeq2 and could use some help. I have 2 genotypes and 14 timepoints. I want to find differences in gene expression between both genotypes overall and at every timepoint and between every two timepoints in each genotype. After reading a lot of manuals, paper and posts here and elsewhere, I believe that the correct design for my experiment is ~genotype+time+time:genotype in the full and ~genotype+time in the reduced formula. Here is what I did so far:
library("DESeq2")
directory<-"............."
sampleFiles <- grep("??",list.files(directory),value=TRUE)
time <- factor(c("T12", "T13", "T14", "T1", "T11", "T5", "T7", "T2", "T9", "T3", "T6", "T8", "T4", "T10", "T12", "T13", "T14", "T1", "T11", "T5", "T7", "T2", "T9", "T3", "T6", "T8", "T4", "T10" ))
genotype <- factor(c("GF","GF","GF","GF","GF","GF","GF","GF","GF","GF","GF","GF","GF","GF","VB","VB","VB","VB","VB","VB","VB","VB","VB","VB","VB","VB","VB","VB"))
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, genotype=genotype, time=time)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~genotype+time+time:genotype)
ddsHTSeq$genotype<-factor(ddsHTSeq$genotype, levels=c("GF","VB"))
ddsHTSeq$time <- factor(ddsHTSeq$time, levels=c("T1", "T2", "T3", "T4", "T5", "T6", "T7", "T8", "T9", "T10", "T11", "T12", "T13", "T14"))
dds <- DESeq(ddsHTSeq)
res <- results(dds)
ddsHTSeq <- estimateSizeFactors(ddsHTSeq)
ddsHTSeq <- estimateDispersions(ddsHTSeq)
ddsLRT <- nbinomLRT(dds, reduced = formula(~genotype+time)
resLRT<- results(ddsLRT)
Unfortunately I get:
> sumis.na(res$pvalue))
[1] 10295
> sum(res$pvalue<0.05, na.rm=TRUE)
[1] 0
> sum(res$padj<0.1, na.rm=TRUE)
[1] 0
So this means, there is no time influenced difference between genotypes. Is that correct? The count data tells me soemthing else though....
I also want to test for differences in gene expression between individual time of each genotype seperately. T2 vs T1, T3 vs T2, T4 vs T3 and so on. How can I access that?
I would be greatful and appreciate any help. Thank you!
Nadia
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] DESeq2_1.4.5 RcppArmadillo_0.4.450.1.0
[3] Rcpp_0.11.2 GenomicRanges_1.16.4
[5] GenomeInfoDb_1.0.2 IRanges_1.22.10
[7] BiocGenerics_0.10.0
loaded via a namespace (and not attached):
[1] annotate_1.42.1 AnnotationDbi_1.22.6 Biobase_2.20.1
[4] DBI_0.2-5 genefilter_1.46.1 geneplotter_1.42.0
[7] grid_3.1.1 lattice_0.20-29 locfit_1.5-9.1
[10] RColorBrewer_1.0-5 RSQLite_0.11.2 splines_3.1.1
[13] stats4_3.1.1 survival_2.37-7 XML_3.98-1.1
[16] xtable_1.7-1 XVector_0.4.0
Wolfgang, I'm not always as pessimistic as you make me sound. ;-) If these are independent measurements then Nadia has replicates anyway, and everything is fine.
In this case, fitting a curve or doing some kind of spline regression will be quite useful. Mike and I had planned since quite a while to add a section to the DESeq2 vignette to show how to do this within DESeq2 (which is actually possible). As usual, finding good example data is the difficulty. Anybody out there has something to offer?
However: If these are just repeated measurements of one sample of each genotype, then one is just comparing two samples, and one should not attempt any formal statistical testing. Any p values of confidence bands are bound to be misleading. This is why I recommend doing something simple like finding the genes with the biggest distance between the expression vectors over time course, then plotting these, and simply looking at the plots.
I've prepared this SummarizedExperiment with help from the original authors:
http://bioconductor.org/packages/devel/data/experiment/html/fission.html
36 samples, 2 conditions, 6 time points.
And have an less-than-developed section using this in the workflow which will be released with Bioc 3.