Question: DESeq2: same count matrix, different design
0
gravatar for da.de
17 months ago by
da.de10
Austria
da.de10 wrote:

Dear all,
I am a beginner and have the following problem. I have a dds object and run DESeq two times using exactly the same count matrix but different design.

(1)  why is the baseMean of dds1 not the same as the baseMean of dds2? I thought the baseMean is only normalized by sizeFactor.

(2) I also had a look at the cooks distance. But I am not sure how to interpret the results. The samples in the 2nd and 4th plot that show higher cooks distance are always those of groupJ. Why? And does this influence my results?

https://drive.google.com/open?id=1t73Qx48g_b3mJiaXAd16IOyEp9RO9WcX

How should I proceed with those comparisons? Thanks for your help!

table(colData(dds)$Group)
# groupA     4
# groupB    22 
# groupC     7
# groupD    30
# groupE     3    
# groupF     5
# groupG     3
# groupH     9   
# groupI     5
# groupJ    16  
# groupK     7 
# groupL     4
table(colData(dds)$GroupJvsall)
# other      99
# groupJ    16

design(dds) <- ~Group
dds1 <- DESeq(dds,parallel=TRUE, betaPrior=TRUE)
# estimating size factors
# estimating dispersions
# gene-wise dispersion estimates: 16 workers
# mean-dispersion relationship
# final dispersion estimates, MLE betas: 16 workers
# fitting model and testing: 16 workers
# -- replacing outliers and refitting for 283 genes
# -- DESeq argument 'minReplicatesForReplace' = 7
# -- original counts are preserved in counts(dds)
# estimating dispersions
# fitting model and testing

design(dds) <- ~ GroupJvsall
dds2 <- DESeq(dds,parallel=TRUE, betaPrior=TRUE)
# estimating size factors
# estimating dispersions
# gene-wise dispersion estimates: 16 workers
# mean-dispersion relationship
# final dispersion estimates, MLE betas: 16 workers
# fitting model and testing: 16 workers
# -- replacing outliers and refitting for 3209 genes
# -- DESeq argument 'minReplicatesForReplace' = 7
# -- original counts are preserved in counts(dds)
# estimating dispersions
# fitting model and testing

stopifnot(identical(rownames(dds1), rownames(dds2)))
stopifnot(identical(colnames(dds1), colnames(dds2)))
stopifnot(identical(rowData(dds1)$baseMean, rowData(dds2)$baseMean))
# Error: identical(rowData(dds1)$baseMean, rowData(dds2)$baseMean) is not TRUE

design(dds) <- ~Group
dds1no <- DESeq(dds,parallel=TRUE, betaPrior=TRUE, minReplicatesForReplace=Inf)
design(dds) <- ~ GroupJvsall
dds2no <- DESeq(dds,parallel=TRUE, betaPrior=TRUE, minReplicatesForReplace=Inf)

stopifnot(identical(rowData(dds1no)$baseMean, rowData(dds1)$baseMean))
#Error: identical(rowData(dds1no)$baseMean, rowData(dds1)$baseMean) is not TRUE
stopifnot(identical(rowData(dds1no)$baseMean, rowData(dds2no)$baseMean))

pdf(file="boxplot.pdf", height=10, width=15,onefile=TRUE)
par(mfrow=c(1,4))
boxplot(log10(assays(dds1)[["cooks"]]), range=0, las=2, main="dds1_Group")
boxplot(log10(assays(dds2)[["cooks"]]), range=0, las=2, main="dds2_GroupJvsall")
boxplot(log10(assays(dds1no)[["cooks"]]), range=0, las=2, main="dds1no_minReplicatesForReplace=Inf")
boxplot(log10(assays(dds2no)[["cooks"]]), range=0, las=2, main="dds1no_minReplicatesForReplace=Inf")
graphics.off()
ADD COMMENTlink modified 17 months ago by Michael Love24k • written 17 months ago by da.de10
Answer: DESeq2: same count matrix, different design
0
gravatar for Michael Love
17 months ago by
Michael Love24k
United States
Michael Love24k wrote:

The baseMean has to change when you replace outliers. If you have a gene with counts of ~10 and then one count of 100,000, when this count is labelled as an outlier, the baseMean needs to be update to give the gene a more reasonable mean, and therefore a more reasonable dispersion prior.

It looks like you have many genes with outliers when you use the second design, most likely because the "all" group is not compatible with a single mean value, and NB counts dispersed around that mean. I'd recommend actually turning off outlier replacement, minReplicatesForReplace=Inf, and instead inspecting the top genes by eye to make sure they are not driven by individual counts.

ADD COMMENTlink written 17 months ago by Michael Love24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 132 users visited in the last hour