In a DESeq2: Appropriate way to deal with knockouts in experiment design (RIPSeq) I asked about how DESeq2 could be used to take the effects of sequencing from knockout (KO) tissue. This is possible by using interactions. If I do this, the results column returns a "basemean". Looking at some other posts, I understand this is calculated by taking the mean of the normalized count data.
I would like to run clustering or classification algorithms on my counts AFTER the effect of the KO sequences have been taken into account. Would it be appropriate to use the data from basemean to do this? In other words, assuming I have built my model to take into account the interaction of the KO, is basemean giving me normalized counts adjusted for the effect of the KO? Looking at how it is calculated, I'm not so sure but I might be misunderstanding something...
The second issue is that according to DESeq2 baseMean counts, basemean returned by DESeq2 does not take transcript length into account so it's probably not appropriate for my downstream applications. If I use the counts()
function though, I don't see how I can get back counts after taking into account the KO tissue. It just gives me back all the counts for all samples.
So is there a way to obtain normalized counts from DESeq2 that I can use in downstream applications that have taken into account the effects from my KO sequences?
When I say "taking into account" I mean that since my experiment involves extracting an RNA subpopulation with beads and an antibody some background binding will occur and so some of the counts in the experimental could be inflated simply because RNAs may bind to the anchor used to isolate the subpopulation. The KO gives you a sense of how much background counts you are getting. So "taking into account means": given that my experimental counts are inflated, how much of this effect is actually due to what we see in the KO.
DESeq2 will allow me to consider WT and KO as part of a design setup with interaction terms. It then gives me differentially expressed genes and fold-changes. But I want to see what the normalized counts are AFTER model fitting where interaction terms are taken into account. Not just the fold-changes.
I am not certain that what you suggested with a batch effect will give me what I am looking for.
It sounds like you want to cluster genes based on the KO vs WT ratios? You can run a transformation, one of the ones Steve mentioned, and then subtract (because the transformations produce log-scale data) certain groups from others to construct these log ratios. Then you could cluster using the log ratios.