We have small RNA Seq data of sick and healthy pregnant women blood samples in 2 stages in their pregnancy. The samples are paired, i.e., for each woman there are 2 samples: one from the first trimester and one from the second trimester.
I look for transcripts that are differentially expressed between "sick" and "healthy" in the first trimester and in the second trimester separately. Additionally I look for transcripts that their fold change is different between the 2 time points.
I followed the instructions in http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments with the design formula: ~ condition + trimester + condition:trimester (condition is either 1 which means sick, or 0 which means healthy) and:
dds <- DESeq(dds, test="LRT", reduced = ~ Trimester + condition, fitType="mean") res <- results(dds, alpha = 0.05) res1trimester <- results(dds, name="condition_1_vs_0",alpha = 0.05, test="Wald") res2trimester <- results(dds, contrast = c(list("condition_1_vs_0","Trimester2.condition1")),alpha = 0.05, test="Wald")
My question is whether this way I'm ignoring the fact that the samples are paired? If so, should I add to the design the woman ID?
Another question: In a later analysis, I divided the data into 2 separate datasets: 1. samples from the first trimester, and 2. samples from the second trimester. I then analyzed each data set for differential expression between "sick" and "healthy". The results of these analyses were different from the results I got from the DE analysis described above (res1trimester and res2trimester). Shouldn't be the same? am I missing something? The differences were quite big..