Question

DESeq2: Do basemean values take into account interaction effects?

0

Entering edit mode

snamjoshi87 ▴ 40

@snamjoshi87-11184

Last seen 8.0 years ago

In a DESeq2: Appropriate way to deal with knockouts in experiment design (RIPSeq) I asked about how DESeq2 could be used to take the effects of sequencing from knockout (KO) tissue. This is possible by using interactions. If I do this, the results column returns a "basemean". Looking at some other posts, I understand this is calculated by taking the mean of the normalized count data.

I would like to run clustering or classification algorithms on my counts AFTER the effect of the KO sequences have been taken into account. Would it be appropriate to use the data from basemean to do this? In other words, assuming I have built my model to take into account the interaction of the KO, is basemean giving me normalized counts adjusted for the effect of the KO? Looking at how it is calculated, I'm not so sure but I might be misunderstanding something...

The second issue is that according to DESeq2 baseMean counts, basemean returned by DESeq2 does not take transcript length into account so it's probably not appropriate for my downstream applications. If I use the counts() function though, I don't see how I can get back counts after taking into account the KO tissue. It just gives me back all the counts for all samples.

So is there a way to obtain normalized counts from DESeq2 that I can use in downstream applications that have taken into account the effects from my KO sequences?

deseq2 counts normalized counts • 1.6k views

ADD COMMENT • link updated 8.6 years ago by Steve Lianoglou ★ 13k • written 8.6 years ago by snamjoshi87 ▴ 40

score 1 · Answer 1 · 2016-07-29

1

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 24 months ago

United States

baseMean is simply the average number of counts (adjusted for sequencing depth) of the gene across your entire DESeqDataSet. The design of the experiment is not taken into account at all.

Further, it's really not clear what you mean by "taking into account the KO effect" -- how do you want to account for it? Do you want to treat it as a batch effect (of sorts) and regress it out? Seems strange, but it's your analysis, so:

If that's the case, you'll want to transform your counts using one of DESeq2's variance stabilizing transforms (eg. vst, rlog, normTransform) then you can use limma's removeBatchEffect function, passing along a vector of WT/KO entries as its batch parameter.

ADD COMMENT • link 8.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

When I say "taking into account" I mean that since my experiment involves extracting an RNA subpopulation with beads and an antibody some background binding will occur and so some of the counts in the experimental could be inflated simply because RNAs may bind to the anchor used to isolate the subpopulation. The KO gives you a sense of how much background counts you are getting. So "taking into account means": given that my experimental counts are inflated, how much of this effect is actually due to what we see in the KO.

DESeq2 will allow me to consider WT and KO as part of a design setup with interaction terms. It then gives me differentially expressed genes and fold-changes. But I want to see what the normalized counts are AFTER model fitting where interaction terms are taken into account. Not just the fold-changes.

I am not certain that what you suggested with a batch effect will give me what I am looking for.

ADD REPLY • link 8.6 years ago snamjoshi87 ▴ 40

0

Entering edit mode

It sounds like you want to cluster genes based on the KO vs WT ratios? You can run a transformation, one of the ones Steve mentioned, and then subtract (because the transformations produce log-scale data) certain groups from others to construct these log ratios. Then you could cluster using the log ratios.

ADD REPLY • link 8.6 years ago Michael Love 43k