Dear all,

I am a beginner and have the following problem. I have a dds object and run DESeq two times using exactly the same count matrix but different design.

(1) why is the baseMean of dds1 not the same as the baseMean of dds2? I thought the baseMean is only normalized by sizeFactor.

(2) I also had a look at the cooks distance. But I am not sure how to interpret the results. The samples in the 2nd and 4th plot that show higher cooks distance are always those of groupJ. Why? And does this influence my results?

https://drive.google.com/open?id=1t73Qx48g_b3mJiaXAd16IOyEp9RO9WcX

How should I proceed with those comparisons? Thanks for your help!

table(colData(dds)$Group) # groupA 4 # groupB 22 # groupC 7 # groupD 30 # groupE 3 # groupF 5 # groupG 3 # groupH 9 # groupI 5 # groupJ 16 # groupK 7 # groupL 4 table(colData(dds)$GroupJvsall) # other 99 # groupJ 16 design(dds) <- ~Group dds1 <- DESeq(dds,parallel=TRUE, betaPrior=TRUE) # estimating size factors # estimating dispersions # gene-wise dispersion estimates: 16 workers # mean-dispersion relationship # final dispersion estimates, MLE betas: 16 workers # fitting model and testing: 16 workers # -- replacing outliers and refitting for 283 genes # -- DESeq argument 'minReplicatesForReplace' = 7 # -- original counts are preserved in counts(dds) # estimating dispersions # fitting model and testing design(dds) <- ~ GroupJvsall dds2 <- DESeq(dds,parallel=TRUE, betaPrior=TRUE) # estimating size factors # estimating dispersions # gene-wise dispersion estimates: 16 workers # mean-dispersion relationship # final dispersion estimates, MLE betas: 16 workers # fitting model and testing: 16 workers # -- replacing outliers and refitting for 3209 genes # -- DESeq argument 'minReplicatesForReplace' = 7 # -- original counts are preserved in counts(dds) # estimating dispersions # fitting model and testing stopifnot(identical(rownames(dds1), rownames(dds2))) stopifnot(identical(colnames(dds1), colnames(dds2))) stopifnot(identical(rowData(dds1)$baseMean, rowData(dds2)$baseMean)) # Error: identical(rowData(dds1)$baseMean, rowData(dds2)$baseMean) is not TRUE design(dds) <- ~Group dds1no <- DESeq(dds,parallel=TRUE, betaPrior=TRUE, minReplicatesForReplace=Inf) design(dds) <- ~ GroupJvsall dds2no <- DESeq(dds,parallel=TRUE, betaPrior=TRUE, minReplicatesForReplace=Inf) stopifnot(identical(rowData(dds1no)$baseMean, rowData(dds1)$baseMean)) #Error: identical(rowData(dds1no)$baseMean, rowData(dds1)$baseMean) is not TRUE stopifnot(identical(rowData(dds1no)$baseMean, rowData(dds2no)$baseMean)) pdf(file="boxplot.pdf", height=10, width=15,onefile=TRUE) par(mfrow=c(1,4)) boxplot(log10(assays(dds1)[["cooks"]]), range=0, las=2, main="dds1_Group") boxplot(log10(assays(dds2)[["cooks"]]), range=0, las=2, main="dds2_GroupJvsall") boxplot(log10(assays(dds1no)[["cooks"]]), range=0, las=2, main="dds1no_minReplicatesForReplace=Inf") boxplot(log10(assays(dds2no)[["cooks"]]), range=0, las=2, main="dds1no_minReplicatesForReplace=Inf") graphics.off()