Entering edit mode

lina.faller
•
0

@linafaller-9082
Last seen 5.9 years ago

Hi all,

I am using DESeq2 to calculate differentially expressed genes. Is there a good way to determine how well the model fits the data? I'd appreciate it if you could share any insight or resources.

Thanks!

~Lina

Hi!!

I have the same question. I've got several factors and I would like to know if using all of them creates a better model. Did you find anything about it? I've been looking for an answer, but still not luck.... Thanks!!

You seem to be thinking that you are fitting a single model. You aren't. Instead you are simultaneously fitting thousands of models, so asking 'how well a model fits' is in some sense nonsensical. Which model might you be asking about?

In general people just fit a model containing any nuisance variables that may affect the gene expression, and call it good. If you really care, you can test all of the nuisance variables you are including in your model and drop those that aren't 'significant enough', where by that I mean those variables that are either not significant for any gene, or for only a small subset of genes, when testing using a likelihood ratio test.

The downside of including too many nuisance variables is that you may be wasting degrees of freedom and reducing power, or that you may be including variables that are not orthogonal, which can be problematic. Otherwise an overspecified model isn't that big of a deal.

Yes, I know that a different model is tested for each gene considered, but I am testing two different factors and I would like to know which one fits the expression better and I cannot find any Akaike implementation or bayesian information criterion (I cannot rely only in the wald test of the factors because the models aren't nested).