Question: DESeq2: same count matrix, different design
0
20 months ago by
da.de10
Austria
da.de10 wrote:

Dear all,
I am a beginner and have the following problem. I have a dds object and run DESeq two times using exactly the same count matrix but different design.

(1)  why is the baseMean of dds1 not the same as the baseMean of dds2? I thought the baseMean is only normalized by sizeFactor.

(2) I also had a look at the cooks distance. But I am not sure how to interpret the results. The samples in the 2nd and 4th plot that show higher cooks distance are always those of groupJ. Why? And does this influence my results?

How should I proceed with those comparisons? Thanks for your help!

table(colData(dds)$Group) # groupA 4 # groupB 22 # groupC 7 # groupD 30 # groupE 3 # groupF 5 # groupG 3 # groupH 9 # groupI 5 # groupJ 16 # groupK 7 # groupL 4 table(colData(dds)$GroupJvsall)
# other      99
# groupJ    16

design(dds) <- ~Group
dds1 <- DESeq(dds,parallel=TRUE, betaPrior=TRUE)
# estimating size factors
# estimating dispersions
# gene-wise dispersion estimates: 16 workers
# mean-dispersion relationship
# final dispersion estimates, MLE betas: 16 workers
# fitting model and testing: 16 workers
# -- replacing outliers and refitting for 283 genes
# -- DESeq argument 'minReplicatesForReplace' = 7
# -- original counts are preserved in counts(dds)
# estimating dispersions
# fitting model and testing

design(dds) <- ~ GroupJvsall
dds2 <- DESeq(dds,parallel=TRUE, betaPrior=TRUE)
# estimating size factors
# estimating dispersions
# gene-wise dispersion estimates: 16 workers
# mean-dispersion relationship
# final dispersion estimates, MLE betas: 16 workers
# fitting model and testing: 16 workers
# -- replacing outliers and refitting for 3209 genes
# -- DESeq argument 'minReplicatesForReplace' = 7
# -- original counts are preserved in counts(dds)
# estimating dispersions
# fitting model and testing

stopifnot(identical(rownames(dds1), rownames(dds2)))
stopifnot(identical(colnames(dds1), colnames(dds2)))
stopifnot(identical(rowData(dds1)$baseMean, rowData(dds2)$baseMean))
# Error: identical(rowData(dds1)$baseMean, rowData(dds2)$baseMean) is not TRUE

design(dds) <- ~Group
dds1no <- DESeq(dds,parallel=TRUE, betaPrior=TRUE, minReplicatesForReplace=Inf)
design(dds) <- ~ GroupJvsall
dds2no <- DESeq(dds,parallel=TRUE, betaPrior=TRUE, minReplicatesForReplace=Inf)

stopifnot(identical(rowData(dds1no)$baseMean, rowData(dds1)$baseMean))
#Error: identical(rowData(dds1no)$baseMean, rowData(dds1)$baseMean) is not TRUE
stopifnot(identical(rowData(dds1no)$baseMean, rowData(dds2no)$baseMean))

pdf(file="boxplot.pdf", height=10, width=15,onefile=TRUE)
par(mfrow=c(1,4))
boxplot(log10(assays(dds1)[["cooks"]]), range=0, las=2, main="dds1_Group")
boxplot(log10(assays(dds2)[["cooks"]]), range=0, las=2, main="dds2_GroupJvsall")
boxplot(log10(assays(dds1no)[["cooks"]]), range=0, las=2, main="dds1no_minReplicatesForReplace=Inf")
boxplot(log10(assays(dds2no)[["cooks"]]), range=0, las=2, main="dds1no_minReplicatesForReplace=Inf")
graphics.off()
modified 20 months ago by Michael Love25k • written 20 months ago by da.de10
Answer: DESeq2: same count matrix, different design
0
20 months ago by
Michael Love25k
United States
Michael Love25k wrote:

The baseMean has to change when you replace outliers. If you have a gene with counts of ~10 and then one count of 100,000, when this count is labelled as an outlier, the baseMean needs to be update to give the gene a more reasonable mean, and therefore a more reasonable dispersion prior.

It looks like you have many genes with outliers when you use the second design, most likely because the "all" group is not compatible with a single mean value, and NB counts dispersed around that mean. I'd recommend actually turning off outlier replacement, minReplicatesForReplace=Inf, and instead inspecting the top genes by eye to make sure they are not driven by individual counts.