In the documentation or the rlog() function we can find that
rlog(K_ij) = log2(q_ij) = beta_i0 + beta_ij,
Which means the function would return the log2 transformed data after normalization by a size factor, estimating dispersion, shrinking dispersion and then the the beta parameters.
Following the description of the paper accompanying DESeq2 package, it seems like the model for q_ij is:
q_ij = exp(x^T * beta)
where x is the vector of covariates and beta the vector of coefficients in glm negative binomial model.
It seems like if we only have 1 factor covariate with 2 possible levels, then x is in {0,1} and we only have two possible values for beta_1j (depending on whether x_j = 1 or 0).
When I run rlog on the raw count data, the transformed counts are still different (even though similar) for each column even when belonging to the same class (with the same covariate).
It would be great if one of the developers could answer this question. I would greatly appreciate it.
Best,
Lan
1) Yes.
2) We calculate one prior variance for the whole matrix: "The prior variance is found by matching the 97.5% quantile of a zero-centered normal distribution to the 95% quantile of the absolute values in the LFC matrix."
3-4) Yes, if blind=TRUE, otherwise we use the dispersion trend already calculated using the experimental design (see vignette discussion of blind=TRUE or FALSE)
5) Yes.
The idea is to shrink sample-to-sample differences when there is little information (low counts) and to preserve these differences when there is information (high counts).