Hi
We have small RNA Seq data of sick and healthy pregnant women blood samples in 2 stages in their pregnancy. The samples are paired, i.e., for each woman there are 2 samples: one from the first trimester and one from the second trimester.
I look for transcripts that are differentially expressed between "sick" and "healthy" in the first trimester and in the second trimester separately. Additionally I look for transcripts that their fold change is different between the 2 time points.
I followed the instructions in http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments with the design formula: ~ condition + trimester + condition:trimester (condition is either 1 which means sick, or 0 which means healthy) and:
dds <- DESeq(dds, test="LRT", reduced = ~ Trimester + condition, fitType="mean") res <- results(dds, alpha = 0.05) res1trimester <- results(dds, name="condition_1_vs_0",alpha = 0.05, test="Wald") res2trimester <- results(dds, contrast = c(list("condition_1_vs_0","Trimester2.condition1")),alpha = 0.05, test="Wald")
My question is whether this way I'm ignoring the fact that the samples are paired? If so, should I add to the design the woman ID?
Another question: In a later analysis, I divided the data into 2 separate datasets: 1. samples from the first trimester, and 2. samples from the second trimester. I then analyzed each data set for differential expression between "sick" and "healthy". The results of these analyses were different from the results I got from the DE analysis described above (res1trimester and res2trimester). Shouldn't be the same? am I missing something? The differences were quite big..
Thanks
Liron
Sorry, I added some missing data to my question, and also added another question regarding the same analysis. The "condition" is either 0 (healthy) or 1 (sick). This is the main feature of the differential expression analysis.
Thanks! :)
Yes, the results are expected to be different when you subset to just pairs of groups of samples as to when you test coefficients in a larger model, most of all because the dispersion estimation will be different. See our FAQ which discusses the trade-off.
With fixed effects, you can do the comparison across trimester within the individuals, but you can't directly compare across condition and control for individual, because individual is nested within condition (and so in a fixed effects model those are confounded variables). You would have to use something like duplicateCorrelation() in limma-voom to make those comparisons.