Question

edgeR choose the best model of GLM

0

Entering edit mode

Eszter Ari ▴ 40

@ari-eszter-5784

Last seen 2.0 years ago

Hungary

Dear edgeR Users, I have RNS-Seq count data by genes, 4 treatment groups (each with 2 biological replicates) and two factors (E: C or H, and A: 13 or 25). And I applied the GLM model of edgeR with these designs: E, A, E+A and E*A I would like to decide which model and which variant describes my data better. 1. Am I right that the $coefficients of a "DGELRT" object are log likelihood values? 2. Does anyone have a suggestion how can I apply AIC (Akaike Information Criterion) test or Likelihood ratio (LR) test on edgeR GLM results to choose the best model? 3. Is there a possibility to plot the multiple GLM regression for a gene? (I would like plot something like this: http://www.jerrydallal.com/LHSP/pix/regpix4.gif ) Bests, Eszter -- Eszter Ari Institut für Populationsgenetik Vetmeduni Vienna Veterinärplatz 1 1210 Wien, Austria [[alternative HTML version deleted]]

Regression edgeR Regression edgeR • 4.7k views

ADD COMMENT • link updated 12.8 years ago by Gordon Smyth 53k • written 12.8 years ago by Eszter Ari ▴ 40

score 0 · Answer 1 · 2013-02-22

Dear Eszter, The problem that you've stated (which of these models best describes my data) applies to univariate data, for fitting models to a single vector of counts. For RNA-seq experiments, you are fitting glms to many data vectors. You will get different answers for different genes, so you have to think a bit differently. With edgeR, you would start by fitting the full model design <- model.matrix(~E+A+E:A) then, after some intermediate commands, use fit <- glmFit(y, design) lrt <- glmLRT(fit, coef=4) topTags(lrt) to test for which genes the interaction is significant. If the test is significant, then the E*A model is necessary and best. If not, one of the simpler models is satisfactory. You could subset to genes for which the interaction is not significant, then try design <- model.matrix(~E+A) and proceed from there. I interpolate some other answers below. > Date: Wed, 20 Feb 2013 15:15:20 +0100 > From: Ari Eszter <arieszter at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] edgeR choose the best model of GLM > > Dear edgeR Users, > > I have RNS-Seq count data by genes, 4 treatment groups (each with 2 > biological replicates) and two factors (E: C or H, and A: 13 or 25). And > I applied the GLM model of edgeR with these designs: E, A, E+A and E*A > I would like to decide which model and which variant describes my data > better. > > 1. Am I right that the $coefficients of a "DGELRT" object are log > likelihood values? No. They are coefficients from the linear model fitted, as explained in the documentation: help("glmFit"). > 2. Does anyone have a suggestion how can I apply AIC (Akaike Information > Criterion) test or Likelihood ratio (LR) test on edgeR GLM results to > choose the best model? Answered above. We have experimented with AIC, but have not found it particularly successful. > 3. Is there a possibility to plot the multiple GLM regression for a > gene? (I would like plot something like this: > http://www.jerrydallal.com/LHSP/pix/regpix4.gif ) I have no idea what this plot represents. In any case, I guess this is for one data vector. For RNA-seq data, you would have to view 20,000 of these plots, not a very productive way to go. Best wishes Gordon > Bests, > Eszter > > -- > Eszter Ari > Institut f?r Populationsgenetik > Vetmeduni Vienna > Veterin?rplatz 1 > 1210 Wien, Austria > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}