My question is (I believe) more of a conceptual nature of what I can do with a time course RNA Seq dataset. I will still include the code below (and please let me know if I leave something out that should be in here and I’ll make sure to include it).
My dataset consists of 4 genotypes that were collected every 24h until they showed a given phenotype, which resulted in 10, 13, 15, or 17 data points with 3 replicates. (For those that work with plants, dormant buds collected from the field and placed into forcing conditions, collected daily for RNA Seq until budbreak).
Because what I’m mostly looking for are similarities between the genotypes, I’m treating the datasets separately.
I’m using time as a continuous variable, with up to time^4. By doing this I’m covering for transient expression of genes (alternatively I could use splines, but that wouldn’t make my life easier I think in doing what I want to do). In DESeq, I compare a full model to one with simply an intercept, and with that I end up getting ~20k DE genes – essentially everything that has a non-zero expression at some point. What I did then was extract the parameter estimates and intercepts for the model, and then filtered the genes for what had a max predicted counts lower than 10 and logFC of 2 within the time frame of each genotype. This filtering step was done manually on excel. In the end I had ~8k DE genes for each of the genotypes, which I then used for pathway enrichment analysis and qualitative comparisons between the genotypes (e.g. a gene related to growth that starts going up for genotype A earlier, and genotype D later, matching the time to phenotype idea).
I think my approach is appropriate, but I wanted some confirmation of this. I couldn’t find anything that treated RNA Seq data in this way. Most of the experiments only use time as a factor in an interaction term with a treatment, but not the effect of time itself.
Thanks in advance!
sampleTable<-data.frame(sampleName=z$sampleFiles,fileName=z$fileName, time=z$Time, time2=z$Time2,time3=z$Time3,time4=z$Time4)
dds=0 library(DESeq2) ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design = ~time+time2+time3+time4) dds<-DESeq(ddsHTSeq, test="LRT", reduced=~1, fitType = "parametric") res<-results(dds) summary(res) head(res[order(res$padj),]) a=dds@rowRanges@elementMetadata@listData$Intercept b=dds@rowRanges@elementMetadata@listData$time c=dds@rowRanges@elementMetadata@listData$time2 d=dds@rowRanges@elementMetadata@listData$time3 e=dds@rowRanges@elementMetadata@listData$time4 f=dds@rowRanges@partitioning@NAMES g=results(dds)$padj t=data.frame(cbind(f,a,b,c,d,e,g)) colnames(t)=c("GeneID","Intercept", "betatime","betatime2","betatime3","betatime4","padjumodel") summary(t) write.csv(t,"DESeqresults.csv")