Question

What does DESeq2 rlog() return exactly?

0

Entering edit mode

lanhuong ▴ 20

@lanhuong-9878

Last seen 6.6 years ago

In the documentation or the rlog() function we can find that

rlog(K_ij) = log2(q_ij) = beta_i0 + beta_ij,

Which means the function would return the log2 transformed data after normalization by a size factor, estimating dispersion, shrinking dispersion and then the the beta parameters.

Following the description of the paper accompanying DESeq2 package, it seems like the model for q_ij is:

q_ij = exp(x^T * beta)

where x is the vector of covariates and beta the vector of coefficients in glm negative binomial model.

It seems like if we only have 1 factor covariate with 2 possible levels, then x is in {0,1} and we only have two possible values for beta_1j (depending on whether x_j = 1 or 0).

When I run rlog on the raw count data, the transformed counts are still different (even though similar) for each column even when belonging to the same class (with the same covariate).

It would be great if one of the developers could answer this question. I would greatly appreciate it.

Best,

Lan

deseq2 rlog rlog transformation • 7.3k views

ADD COMMENT • link 8.1 years ago lanhuong ▴ 20

score 0 · Answer 1 · 2016-03-09

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

Can you read over the description of rlog in the DESeq2 paper and come back with more questions if that part is not clear?

ADD COMMENT • link 8.1 years ago Michael Love 41k

score 0 · Answer 2 · 2016-03-11

Hi,

So to make sure I understand all the steps correctly, since the part on rlog in DESEq2 paper is a bit short. Is this the sequence of operations done by rlog?

1. Matrix of initial LFC estimates is computed as M_ij = log_2 (K_ij/s_j + 1/2) / mean_j (K_ij/s_j + 1/2) for all i and j.

2. The prior variance if found for each row of M_ij by matching a zero centered normal by matching quantiles.

3. The negative binomial GLM is fit to every row of M using only an intercept term to obtain row-wise dispersion estimates.

4. A trend is fit to the dispersion estimates get alpha_tr(mu_bar) to capture the variance-normalized means dependence.

5. Using a design matrix M x (N+1) with a column of all ones and the indicator columns corresponding to every sample, and priors from step 2, rlog fits a GLM negative binomial model with dispersion parameters fixed at estimates from the trend alpha_tr(mu_bar) to each row of the LFC matrix M.

Is this the correct understanding of the procedure? Are there any steps that are missing in the above?

Thank you!