Hi. I have recently worked with both microarray and RNA-Seq data. For differential expression analysis of microarrays, I used `limma`

with log2 transformed intensities as input. For RNA-Seq, I used `DESeq2`

with raw counts (derived from `salmon`

) as input.

Why does `limma`

want/require log2 transformed intensities but `DESeq2`

wants untransformed counts?

I'm aware that `limma`

uses a different regression model compared to `DESeq2`

(linear model vs negative binomial GLM) due to different types of data (intensities vs counts), so I'm more interested in the reason why input data should or shouldn't be transformed before regression analysis.

I imagine `DESeq2`

doesn't want log2 transformed values because the negative binomial GLM already handles heteroscedasticity and the log link function ensures the model coefficients are log2 fold changes. `limma`

's choice to use a regular linear regression model means to meet homooscedasticity it needs to reduce the heteroscedasticity of the intensity data with a log transform. If this is true, why doesn't `limma`

avoid requiring log2 transformed input data and simply use an appropriate GLM with a log link function? Or alternatively, why doesn't `DESeq2`

avoid the extra computation required to fit a GLM and simply log2 transform counts that are then used as input to a linear regression.

I ask because I'm pretty sure these approaches give different results. Using untransformed values as input to a linear regression with a log link should be different than using log2 transformed values as input to a linear regression with the identity link function. Does one of these approaches to data transformation have more justification than the other?

Thanks for pointing me toward the limma voom paper, that answered my question.