Dear All,
I would like to plot residuals vs. fitted values for some of the candidate genes, after fitting the model and testing with DESeq2 package. Can someone tell me how I could extract the computed residuals and the fitted values per gene? Thank you in advance!
Best, Mike
Hello Mike,
Thank you so much for taking the time to provide this answer. I have a simple question, but unfortunately I lack the necessary statistical expertise to ask this question correctly, but I will try to do my best.
Using the equations above, I extracted the residuals and plotted a histogram of them. The distribution of residuals did not appear to be normal -- they were quite leptokurtic (perhaps to be expected from a Poisson distribution)?
The quick question I had was if you were using a specialized wald test -- I thought wald test required normal data?
I tried to dig into the wald function in your fitBeta function, but got a little lost in the c implementation.
betaRes <- fitBeta(ySEXP = countsMatrix, xSEXP = modelMatrix,
nfSEXP = normalizationFactors, alpha_hatSEXP = alpha_hat,
contrastSEXP = contrast, beta_matSEXP = beta_mat, lambdaSEXP = lambda,
tolSEXP = 1e-08, maxitSEXP = 0, useQRSEXP = FALSE)
Perhaps the above described residuals (assays(dds)[["mu"]]) are different to modelMatrix described in the fitBeta function? (My apologies if I'm completely off base here) -- I'm just taking a shot in the dark. Any illumination you could provide would be greatly appreciated.
hi Philip,
"Using the equations above, I extracted the residuals and plotted a histogram of them. The distribution of residuals did not appear to be normal -- they were quite leptokurtic (perhaps to be expected from a Poisson distribution)?"
DESeq2 uses a Negative Binomial GLM (see full details of our method in the preprint which is the citation of the package) and the residuals in a GLM are not in general expected to be Normally distributed. Take a look at one of the classic GLM texts:
McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.
"The quick question I had was if you were using a specialized wald test -- I thought wald test required normal data?"
The Wald test is a test of the estimated coefficient divided by its estimated standard error, this is also described in the preprint, so I'd check there first. The Wald test does not require that the response variable (typically the 'y' or 'K' as we write in our preprint) be Normally distributed.
Thanks Mike for your quick response -- I'll be sure to read up on these and get back to you with any additional questions I might have!