This has probably been addressed before, but I am wondering why I am seeing different results for contrasting the same groups of samples.
Let's say I have 30 samples, sequenced in two batches, with 10 each having diagnosis: normal, tumor, invasive. The objective is to obtain a list of genes that are differentially expressed between 'normal' and 'tumor' groups
Case 1: Run DESeq on the whole dataset and contrast diagnosis groups of interest:
dds <- DESeqDataSetFromMatrix(countData = reads, colData = design, design = ~ batch + diagnosis)
dds <- DESeq(dds, parallel=TRUE)
res <- results(dds, contrast=c("diagnosis", "normal", "tumor"))
Case 2: Subset the sample list before running DESeq then get contrast:
readssubset <- readcounts[,which(design.file$diagnosis != 'invasive')]
designsubset <- design[design[, "diagnosis"] != 'invasive,]
dds <- DESeqDataSetFromMatrix(countData = readssubset, colData = design_subset, design = ~ batch + diagnosis)
dds <- DESeq(dds, parallel=TRUE)
res <- results(dds, contrast=c("diagnosis", "normal", "tumor"))
I thought this would lead to the same results but it doesn't. Can anyone help to explain what is the difference here?
Thanks,
Sure is. And now I am remembering that I read it before. Thanks again for your patience and light speed.