edgeR choose the best model of GLM
1
0
Entering edit mode
Ari Eszter ▴ 40
@ari-eszter-5784
Last seen 7.7 years ago
Dear edgeR Users, I have RNS-Seq count data by genes, 4 treatment groups (each with 2 biological replicates) and two factors (E: C or H, and A: 13 or 25). And I applied the GLM model of edgeR with these designs: E, A, E+A and E*A I would like to decide which model and which variant describes my data better. 1. Am I right that the $coefficients of a "DGELRT" object are log likelihood values? 2. Does anyone have a suggestion how can I apply AIC (Akaike Information Criterion) test or Likelihood ratio (LR) test on edgeR GLM results to choose the best model? 3. Is there a possibility to plot the multiple GLM regression for a gene? (I would like plot something like this: http://www.jerrydallal.com/LHSP/pix/regpix4.gif ) Bests, Eszter -- Eszter Ari Institut für Populationsgenetik Vetmeduni Vienna Veterinärplatz 1 1210 Wien, Austria [[alternative HTML version deleted]]
Regression edgeR Regression edgeR • 3.5k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
Dear Eszter, The problem that you've stated (which of these models best describes my data) applies to univariate data, for fitting models to a single vector of counts. For RNA-seq experiments, you are fitting glms to many data vectors. You will get different answers for different genes, so you have to think a bit differently. With edgeR, you would start by fitting the full model design <- model.matrix(~E+A+E:A) then, after some intermediate commands, use fit <- glmFit(y, design) lrt <- glmLRT(fit, coef=4) topTags(lrt) to test for which genes the interaction is significant. If the test is significant, then the E*A model is necessary and best. If not, one of the simpler models is satisfactory. You could subset to genes for which the interaction is not significant, then try design <- model.matrix(~E+A) and proceed from there. I interpolate some other answers below. > Date: Wed, 20 Feb 2013 15:15:20 +0100 > From: Ari Eszter <arieszter at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] edgeR choose the best model of GLM > > Dear edgeR Users, > > I have RNS-Seq count data by genes, 4 treatment groups (each with 2 > biological replicates) and two factors (E: C or H, and A: 13 or 25). And > I applied the GLM model of edgeR with these designs: E, A, E+A and E*A > I would like to decide which model and which variant describes my data > better. > > 1. Am I right that the $coefficients of a "DGELRT" object are log > likelihood values? No. They are coefficients from the linear model fitted, as explained in the documentation: help("glmFit"). > 2. Does anyone have a suggestion how can I apply AIC (Akaike Information > Criterion) test or Likelihood ratio (LR) test on edgeR GLM results to > choose the best model? Answered above. We have experimented with AIC, but have not found it particularly successful. > 3. Is there a possibility to plot the multiple GLM regression for a > gene? (I would like plot something like this: > http://www.jerrydallal.com/LHSP/pix/regpix4.gif ) I have no idea what this plot represents. In any case, I guess this is for one data vector. For RNA-seq data, you would have to view 20,000 of these plots, not a very productive way to go. Best wishes Gordon > Bests, > Eszter > > -- > Eszter Ari > Institut f?r Populationsgenetik > Vetmeduni Vienna > Veterin?rplatz 1 > 1210 Wien, Austria > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD COMMENT
0
Entering edit mode
Dear Gordon, Thank you for your answers! > The problem that you've stated (which of these models best describes my > data) applies to univariate data, for fitting models to a single vector of > counts. For RNA-seq experiments, you are fitting glms to many data > vectors. You will get different answers for different genes, so you have > to think a bit differently. > I see. But if the variable A gives 100 times more differentially expressed genes (p-value < 0.05) than the variable E, and 10 times more than the variable E:A under E*A model is it possible to say that A explains my count data better? > > > to test for which genes the interaction is significant. If the test is > significant, then the E*A model is necessary and best. If not, one of the > simpler models is satisfactory. You could subset to genes for which the > interaction is not significant, then try > > >> >> 1. Am I right that the $coefficients of a "DGELRT" object are log >> likelihood values? >> > > No. They are coefficients from the linear model fitted, as explained in > the documentation: help("glmFit"). > Is there a possibility to have the log likelihood values? > > 2. Does anyone have a suggestion how can I apply AIC (Akaike Information >> Criterion) test or Likelihood ratio (LR) test on edgeR GLM results to >> choose the best model? >> > > Answered above. We have experimented with AIC, but have not found it > particularly successful. > I see. I was wondering also, because the AIC test have to be calculated for every genes, than how can I say something about the whole dataset... > > 3. Is there a possibility to plot the multiple GLM regression for a gene? >> (I would like plot something like this: http://www.jerrydallal.com/** >> LHSP/pix/regpix4.gif <http: www.jerrydallal.com="" lhsp="" pix="" regpix4.gif=""> ) >> > > I have no idea what this plot represents. In any case, I guess this is > for one data vector. For RNA-seq data, you would have to view 20,000 of > these plots, not a very productive way to go. > I would like to have a regression plot only for some genes - just for illustration. I have 2 replicates for each treatment that means 8 count values for each gene. I would like to draw a 3D regression for those 8 datapoints, while x and z axes would represent A and E variables (with 2-2 possible state only) and y would show the count number. Maybe I should calculate the regression for this plot not by edgeR but with glm.nb. Thanks a lot! Best wishes, Eszter -- Eszter Ari Institut für Populationsgenetik Vetmeduni Vienna Veterinärplatz 1 1210 Wien, Austria [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 264 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6