Does it really matter that you can't get the exact log-likelihood? From what I understand, you interpret the change in the BIC between proposed models to decide which one to use. As the log-likelihood of the saturated model does not depend on the different proposed models, you should get the same change in the BIC regardless of whether you plug in the (negative) deviance or the log-likelihood of the proposed model in the BIC expression.
That being said, I wonder whether you need to do formal model selection at all. Things like the various IC's are geared towards choosing the best-performing model for prediction (e.g., to predict the expression of each gene in a new sample based on its combination of factors). However, when we use edgeR for data analysis, we are typically trying to explain our results in terms of various experimental factors. To this end, the simplest model that contains all the factors of interest should suffice. In contrast, model selection via the IC's tends to produce larger models that do well at prediction but not at explanation, as their coefficients are not easily interpreted.
(On occasion, additional blocking factors may be required, e.g., to account for batch effects, but you can just try fitting the model with and without them and see if you get a substantially greater number of DE genes. If either model gives good results, i.e., decent numbers of DE genes, then you don't need to be too worried about selecting one or the other. It doesn't have to be "right" as long as it's useful.)
Hmm... well, each group of genes will have the same "optimal model", but that doesn't guarantee that they'll behave the same way. For example, in one group associated with a particular model, some genes might have a large positive coefficient (e.g., for a log-fold change term) while others would have a large negative coefficient. Clearly those genes are not behaving in a similar way across samples - indeed, they're behaving in the opposite way - but you'll still end up putting them in the same group. If you extend this issue to multiple coefficients, then the nature of the groups will become quite difficult to interpret.