Dear Community,
I have 30 samples phenotyped as mild (n=10), moderate (n=10) and severe (n=10) disease. Half of the samples in each group are donkey and half are horse.
I want to test for the effect of disease (mild, moderate, severe) controlling for species.
I did this by generating a model and a reduced model followed by contrasts.
ddsTryp3 <- DESeqDataSetFromMatrix(countData = countData, colData = metaData, design= ~Species+Inclusion)
ddsTryp3$Inclusion <- relevel(ddsTryp3$Inclusion, ref = 'Mild')
ddsTryp3 <- ddsTryp3[rowSums(counts(ddsTryp3)) >=20,]
dds1 <- DESeq(ddsTryp3)
reduceddds1 <-DESeq(dds1, test = "LRT", reduced=~Species)
#Contrast analysis DE between moderate and mild disease
metaData$Inclusion <- as.factor(metaData$Inclusion)
metadata_modVSmild <- metaData[(metaData$Inclusion == "Mild")|(metaData$Inclusion == "Moderate"),] #metadata for mild and moderate
metadata_modVSmild$Inclusion <- relevel(metadata_modVSmild$Inclusion , ref = "Mild") #make sure data is re-leveled for mild
res_modVSmild <- results(reduceddds1, contrast=c("Inclusion", "Moderate" , "Mild")) # contrast analysis with just moderate and mild samples
res_modVSmild<- res_modVSmild[order(res_modVSmild$padj) ,] #sort DE results by p adj
write.csv(res_modVSmild, "DE_modVSmild.csv")
#Contrast analysis DE between severe and mild disease
metaData$Inclusion <- as.factor(metaData$Inclusion)
metadata_sevVSmild <- metaData[(metaData$Inclusion == "Mild")|(metaData$Inclusion == "Severe"),] #metadata for mild and severe
metadata_sevVSmild$Inclusion <- relevel(metadata_sevVSmild$Inclusion , ref = "Mild") # data is re-leveled for mild
res_sevVSmild <- results(reduceddds1, contrast=c("Inclusion", "Severe" , "Mild")) # contrast analysis with just severe and mild samples
res_sevVSmild<- res_sevVSmild[order(res_sevVSmild$padj) ,] #sort DE results by p adj
write.csv(res_sevVSmild, "DE_sevVSmild.csv")
This creates two results tables with different log2FoldChange but all other values identical including p values which I understand is because it is comparing the fit of the two models.
I would like to do a pathway analysis (using PathfindR) of mild vs moderate, and mild vs severe. The problem I have is that using these results tables from the reduced model on the whole dataset is that I get exactly the same results for mild vs mod as mild vs severe (same pathways and genes within, fold enrichment and p values) from PathfindR.
Am I better to just split the dataset into mild vs mod; mild vs severe and mod vs severe and then I can run the model and reduced model on each and input those files into PathfindR? Or is that a bad approach as the model controlling for the effect of species is on fewer samples?
Thanks for your help
```