Hi everyone, I'm doing some comparisons between models and for each of them I'm going to compute Akaike information Criterion (AIC) and Bayesian Information Criterion (BIC). When it comes to MAST hurdle model (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5 ... under the chapter "Single-cell RNA-seq hurdle model"), I don't understand how to determine the number of parameters for the Continous part and for the Discrete one. I'll make a toy example:
library(truncnorm) library(MAST) # Experiment with 100 cells and 1000 genes # Generate obs from a truncated gaussian tn <- matrix(rtruncnorm(100000, a=1e-8, b=Inf, mean = 10, sd = 5),ncol = 100) # Generate bernoulli obs bn <- matrix(rbinom(100000,1,c(0.1,0.4,0.7,0.9)),ncol = 100) # Adding zeroes tn[bn==1] <- 0 # Converting to single-cell object sca <- FromMatrix(tn) # Compute cngeneson in order to have a discrete analog of global normalization ngeneson <- apply(tn,2,function(x) mean(x>0)) CD <- colData(sca) CD$ngeneson <- ngeneson CD$cngeneson <- CD$ngeneson-mean(ngeneson) colData(sca) <- CD # hurdle model estimation hm <- zlm(~ 1 + cngeneson ,method = "bayesglm", sca = sca) # AIC calculation: How can I compute the number of parameters? total_AIC <- sum(2 * num_params - 2 * hm@loglik) total_BIC <- sum(log(100) * num_params - 2 * hm@loglik)
From the glm theory I remember that if I compute the difference between:
num_params = -(email@example.com - firstname.lastname@example.org) # That shoul correspond to = n - 1 - (n - p - 1) = p
Then I should add 1 for the intercepts in both Continous and Discrete parts and another 1 for the variance of the truncated normal in the Continous part? But I'm not very sure about this because I don't understand the way df.resid are calculated in this model. Can someone enlighten me please?