Question: Extracting normalized DESeq2 counts for a multi-factor design?
0
4.2 years ago by
feargalr0
European Union
feargalr0 wrote:

Hello,

In a multi-factor DESeq2 design, it accounts for the changes in one group while testing for changes in others. In section 1.5 of the vignette for example it controls for type while testing for differences in condition. I assume it calculates a normalization factor + size factor and then adjusts the counts accordingly?

In that example in the vignette is it possible to access the counts which have been normalized for type and size factor?

Thanks

deseq2 • 2.5k views
modified 4.2 years ago by Michael Love24k • written 4.2 years ago by feargalr0
Answer: Extracting normalized DESeq2 counts for a multi-factor design?
1
4.2 years ago by
Michael Love24k
United States
Michael Love24k wrote:

You're right than size factor/normalization factor (s_ij) and the factor effects are similar, we have (for a standard model):

log(E(count)) = s_ij 2^(beta_intercept + beta_type + beta_cond) (1)

so we could also rearrange terms and write this:

log(E(count)) = s_ij s_type 2^(beta_intercept + beta_cond) (2)

...where s_type = 2^beta_type.

But in the software and model, we keep the type effect in the exponent, as in (1).

I have it on my list of todos, to allow for easier plotting of normalized counts, removing the calculated group/batch effects (like "type" here), to make it easier to see the condition effect for example. But this is not yet implemented. I'll ping this thread when I've added something to the devel branch.

Awesome, thanks!

Michael,

How can I modify the model and extract batch effect-normalized counts in R commands? Or do I have to overwrite some implemented functions?

Tom

See here:

DESeq2 - Acquiring batch-corrected values for PCA and hierarchical clustering

When I do removeBatchEffect(), should the 'batch' function be set, i.e. removeBatchEffect(rlogMat, batch=MyBatch)?

With 'batch=NULL', I only got the same matrix as the input rlogMat, and with 'batch=MyBatch', the output matrix was somewhat different.

Because, as far as I understand, rlog/vst are the values already normalized for mean shifts, I am wondering whether setting 'batch=MyBatch' for removeBatchEffect() would be double counting?

Thanks,

Yes, you should provide the variable, for which you want removeBatchEffect() to remove associated variation.

No, rlog and VST do not normalize for mean shifts associated with variables in the design. This is explained in the vignette in the section on transformations and in the man pages for the functions.

The only information that uses the design is the global trend (the trend across all genes) of estimated dispersions over the mean (when blind=FALSE).

Thanks, I was assuming rlog/vst normalize batch-based trend when blind=FALSE.

If the data has obvious batch effect and I want to do downstream analyses such as unsupervised clustering with batch effect-free expression values, would removeBatchEffect() output be more appropriate than rlogMat?