EdgeR for proteomics
1
0
Entering edit mode
@fabricio-marchini-5543
Last seen 9.7 years ago
Hi, I'm using EdgeR to analyse a proteomic data with peptide counting. I have limited experience on R/EdgeR/Statistics so I appreciate some help. Using the follow code: a=file[,2:64] b=DGEList(counts=a,group=rep(c("La6h","La24h","Lm6h","Lm24h","MO6h","M O24h" ),c(10,11,10,11,10,11)), lib.size=colSums(a)) b <- calcNormFactors(b) times <- rep(c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h"),c(10,11,10,11, 10,11)) times <- factor(times,levels=c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h" )) design <- model.matrix(~factor(times)) disp <- estimateCommonDisp(b) fit <- glmFit(b,design,dispersion=disp$common.dispersion) lrt <- glmLRT(b,fit,coef=fit$design) disp$common.dispersion = 0.0001004979 All proteins (3430) had a p.value of 0. I tried also with fit <- glmFit(b,design,dispersion=disp$common.dispersion) lrt <- glmLRT(b,fit,coef=fit$design) disp$common.dispersion = 3.999943 and that gave me all the proteins with p.value lower than 6.29E-05. That gave a signal that I'm doing something wrong or because of both common dispersions my data is not a appropriate for the analysis. Any suggestions or corrections? -- Fabricio K. Marchini [[alternative HTML version deleted]]
edgeR edgeR • 1.8k views
ADD COMMENT
0
Entering edit mode
Mark Robinson ▴ 880
@mark-robinson-4908
Last seen 5.5 years ago
Hi Fabricio, I suggest you check (at least) 2 things: 1. > disp <- estimateCommonDisp(b) > disp$common.dispersion = 0.0001004979 > disp$common.dispersion = 3.999943 Your example only makes 1 call to estimateCommonDisp(), but you have 2 drastically different values. Are you reporting these as the estimated values, or are you actually running this command and *setting* the common dispersion? It's not clear from your message. You may also want to study some of the GLM-based case studies in: http://www.bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/do c/edgeRUsersGuide.pdf For example, the standard GLM work flow would be similar to that on Page 7. 2. > lrt <- glmLRT(b,fit,coef=fit$design) The docs for the 'coef' argument (?glmLRT) say: ---- coef: integer or character vector indicating which coefficients of the linear model are to be tested equal to zero. Values must be columns or column names of ?design?. Defaults to the last coefficient. Ignored if ?contrast? is specified. ---- As you can see, the function is expecting something very different to what give as your 'coef' argument. Maybe you want 'coef=2:6', if you are looking for any difference between your 6 groups. Of course, maybe you actually want to split your factors into 2 ? one of ("La","Lm","MO") and one of ("6h","24h") and construct a design matrix accordingly. But, this is also not clear from your message. Hope that helps, Mark ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin ---------- http://www.fgcz.ch/Bioconductor2012 On 10.10.2012, at 06:41, Fabricio Marchini wrote: > Hi, > > I'm using EdgeR to analyse a proteomic data with peptide counting. I have > limited experience on R/EdgeR/Statistics so I appreciate some help. > Using the follow code: > > a=file[,2:64] > > b=DGEList(counts=a,group=rep(c("La6h","La24h","Lm6h","Lm24h","MO6h", "MO24h" > ),c(10,11,10,11,10,11)), lib.size=colSums(a)) > > b <- calcNormFactors(b) > > times <- rep(c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h"),c(10,11,10,11, > 10,11)) > > times <- factor(times,levels=c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h" > )) > > design <- model.matrix(~factor(times)) > > disp <- estimateCommonDisp(b) > > fit <- glmFit(b,design,dispersion=disp$common.dispersion) > > lrt <- glmLRT(b,fit,coef=fit$design) > disp$common.dispersion = 0.0001004979 > > All proteins (3430) had a p.value of 0. > > I tried also with > > fit <- glmFit(b,design,dispersion=disp$common.dispersion) > > lrt <- glmLRT(b,fit,coef=fit$design) > disp$common.dispersion = 3.999943 > > and that gave me all the proteins with p.value lower than 6.29E-05. > > That gave a signal that I'm doing something wrong or because of both common > dispersions my data is not a appropriate for the analysis. > > Any suggestions or corrections? > > -- > Fabricio K. Marchini > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Thank you Mark, My absence of clearness is because I'm having trouble to figure out how to deal with the data. at your first question, actually there is a missing command line: > disp <- estimateCommonDisp(b) > disp$common.dispersion = 0.0001004979 *> disp <- estimateGLMCommonDisp(b)* > disp$common.dispersion = 3.999943 I was doing some tests about dispersions, I'm still not certain wich one should I use. The second question: I have 2 comtrols: MO6h MO24h (these are my controls through time) and 2 treatments for both times: La6h La24h (La treatment) Lm6h Lm24h (Lm treatment) My biological questions are, 1) do I have modulation for each treatment through time and 2) there are differences for the same time for both treatments? I do not know how should I use my controls for both situations. For the situation that I'm looking for each time point I could use a intercept design for each time point, like: design <- model.matrix(~group, data=c$samples) But I do not know if that is correct and how to analyse each treatment dealing with the controls. The best situation would be to deal first with the controls and them the treated samples through time, any ideia? -- Fabricio K. Marchini Adjunct Researcher Bioinformatics and Computational Biology Laboratory Functional Genomics Laboratory CV - CNPq <http: lattes.cnpq.br="" 4462407232591810=""> / Web site<http: marchinifk.me=""> Instituto Carlos Chagas / FIOCRUZ-PR Phone +55 41 3316 3236 Fax +55 41 3316 3267 R. Prof. Algacyr Munhoz Mader, 3775 -CIC Curitiba - PR - Brasil, 81350-010 On Wed, Oct 10, 2012 at 5:40 PM, Mark Robinson <mark.robinson@imls.uzh.ch>wrote: > Hi Fabricio, > > I suggest you check (at least) 2 things: > > 1. > > disp <- estimateCommonDisp(b) > > disp$common.dispersion = 0.0001004979 > > > disp$common.dispersion = 3.999943 > > Your example only makes 1 call to estimateCommonDisp(), but you have 2 > drastically different values. Are you reporting these as the estimated > values, or are you actually running this command and *setting* the common > dispersion? It's not clear from your message. > > You may also want to study some of the GLM-based case studies in: > > http://www.bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/ doc/edgeRUsersGuide.pdf > > For example, the standard GLM work flow would be similar to that on Page 7. > > > 2. > > lrt <- glmLRT(b,fit,coef=fit$design) > > The docs for the 'coef' argument (?glmLRT) say: > ---- > coef: integer or character vector indicating which coefficients of > the linear model are to be tested equal to zero. Values must > be columns or column names of ‘design’. Defaults to the last > coefficient. Ignored if ‘contrast’ is specified. > ---- > As you can see, the function is expecting something very different to what > give as your 'coef' argument. Maybe you want 'coef=2:6', if you are looking > for any difference between your 6 groups. Of course, maybe you actually > want to split your factors into 2 one of ("La","Lm","MO") and one of > ("6h","24h") and construct a design matrix accordingly. But, this is also > not clear from your message. > > Hope that helps, > Mark > > > ---------- > Prof. Dr. Mark Robinson > Bioinformatics > Institute of Molecular Life Sciences > University of Zurich > Winterthurerstrasse 190 > 8057 Zurich > Switzerland > > v: +41 44 635 4848 > f: +41 44 635 6898 > e: mark.robinson@imls.uzh.ch > o: Y11-J-16 > w: http://tiny.cc/mrobin > > ---------- > http://www.fgcz.ch/Bioconductor2012 > > > > On 10.10.2012, at 06:41, Fabricio Marchini wrote: > > > Hi, > > > > I'm using EdgeR to analyse a proteomic data with peptide counting. I have > > limited experience on R/EdgeR/Statistics so I appreciate some help. > > Using the follow code: > > > > a=file[,2:64] > > > > > b=DGEList(counts=a,group=rep(c("La6h","La24h","Lm6h","Lm24h","MO6h", "MO24h" > > ),c(10,11,10,11,10,11)), lib.size=colSums(a)) > > > > b <- calcNormFactors(b) > > > > times <- > rep(c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h"),c(10,11,10,11, > > 10,11)) > > > > times <- > factor(times,levels=c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h" > > )) > > > > design <- model.matrix(~factor(times)) > > > > disp <- estimateCommonDisp(b) > > > > fit <- glmFit(b,design,dispersion=disp$common.dispersion) > > > > lrt <- glmLRT(b,fit,coef=fit$design) > > disp$common.dispersion = 0.0001004979 > > > > All proteins (3430) had a p.value of 0. > > > > I tried also with > > > > fit <- glmFit(b,design,dispersion=disp$common.dispersion) > > > > lrt <- glmLRT(b,fit,coef=fit$design) > > disp$common.dispersion = 3.999943 > > > > and that gave me all the proteins with p.value lower than 6.29E-05. > > > > That gave a signal that I'm doing something wrong or because of both > common > > dispersions my data is not a appropriate for the analysis. > > > > Any suggestions or corrections? > > > > -- > > Fabricio K. Marchini > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear Fabricio, A couple quick comments below ? On 15.10.2012, at 17:41, Fabricio Marchini wrote: > Thank you Mark, > > My absence of clearness is because I'm having trouble to figure out how to deal with the data. > > at your first question, actually there is a missing command line: > > > disp <- estimateCommonDisp(b) > > disp$common.dispersion = 0.0001004979 > > > disp <- estimateGLMCommonDisp(b) > > disp$common.dispersion = 3.999943 Read the docs for estimateGLMCommonDisp -- ?estimateGLMCommonDisp You'll want to specify your design matrix here in the call the estimateGLMCommonDisp(), otherwise it calculates dispersion relative to just an intercept model (i.e. all groups have same mean). That is probably why your estimate is so high. Do you have real biological replicates here? Given the very low dispersion with estimateCommonDisp(), I would guess no. In that case, read the edgeR users guide -- 2.9 What to do if you have no replicates. > I was doing some tests about dispersions, I'm still not certain wich one should I use. > > The second question: > > I have 2 comtrols: > MO6h MO24h (these are my controls through time) > and 2 treatments for both times: > La6h La24h (La treatment) > Lm6h Lm24h (Lm treatment) > > My biological questions are, 1) do I have modulation for each treatment through time and 2) there are differences for the same time for both treatments? > I do not know how should I use my controls for both situations. For the situation that I'm looking for each time point I could use a intercept design for each time point, like: > > design <- model.matrix(~group, data=c$samples) The comment from my previous message still applies -- split your variables into treatment and time and create a design matrix (and maybe contrast matrix) according to the questions you want to answer. Have a look at the limma/edgeR user's guide for some examples, or ask a local statistician. Best regards, Mark > > But I do not know if that is correct and how to analyse each treatment dealing with the controls. The best situation would be to deal first with the controls and them the treated samples through time, any ideia? > > -- > Fabricio K. Marchini > Adjunct Researcher > > Bioinformatics and Computational Biology Laboratory > Functional Genomics Laboratory > CV - CNPq / Web site > > Instituto Carlos Chagas / FIOCRUZ-PR > Phone +55 41 3316 3236 > Fax +55 41 3316 3267 > R. Prof. Algacyr Munhoz Mader, 3775 -CIC > Curitiba - PR - Brasil, 81350-010 > > > On Wed, Oct 10, 2012 at 5:40 PM, Mark Robinson <mark.robinson at="" imls.uzh.ch=""> wrote: > Hi Fabricio, > > I suggest you check (at least) 2 things: > > 1. > > disp <- estimateCommonDisp(b) > > disp$common.dispersion = 0.0001004979 > > > disp$common.dispersion = 3.999943 > > Your example only makes 1 call to estimateCommonDisp(), but you have 2 drastically different values. Are you reporting these as the estimated values, or are you actually running this command and *setting* the common dispersion? It's not clear from your message. > > You may also want to study some of the GLM-based case studies in: > http://www.bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/ doc/edgeRUsersGuide.pdf > > For example, the standard GLM work flow would be similar to that on Page 7. > > > 2. > > lrt <- glmLRT(b,fit,coef=fit$design) > > The docs for the 'coef' argument (?glmLRT) say: > ---- > coef: integer or character vector indicating which coefficients of > the linear model are to be tested equal to zero. Values must > be columns or column names of ?design?. Defaults to the last > coefficient. Ignored if ?contrast? is specified. > ---- > As you can see, the function is expecting something very different to what give as your 'coef' argument. Maybe you want 'coef=2:6', if you are looking for any difference between your 6 groups. Of course, maybe you actually want to split your factors into 2 ? one of ("La","Lm","MO") and one of ("6h","24h") and construct a design matrix accordingly. But, this is also not clear from your message. > > Hope that helps, > Mark > > > ---------- > Prof. Dr. Mark Robinson > Bioinformatics > Institute of Molecular Life Sciences > University of Zurich > Winterthurerstrasse 190 > 8057 Zurich > Switzerland > > v: +41 44 635 4848 > f: +41 44 635 6898 > e: mark.robinson at imls.uzh.ch > o: Y11-J-16 > w: http://tiny.cc/mrobin > > ---------- > http://www.fgcz.ch/Bioconductor2012 > > > > On 10.10.2012, at 06:41, Fabricio Marchini wrote: > > > Hi, > > > > I'm using EdgeR to analyse a proteomic data with peptide counting. I have > > limited experience on R/EdgeR/Statistics so I appreciate some help. > > Using the follow code: > > > > a=file[,2:64] > > > > b=DGEList(counts=a,group=rep(c("La6h","La24h","Lm6h","Lm24h","MO6h ","MO24h" > > ),c(10,11,10,11,10,11)), lib.size=colSums(a)) > > > > b <- calcNormFactors(b) > > > > times <- rep(c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h"),c(10,11,10,11, > > 10,11)) > > > > times <- factor(times,levels=c("La6h","La24h","Lm6h","Lm24h","MO6h","MO24h" > > )) > > > > design <- model.matrix(~factor(times)) > > > > disp <- estimateCommonDisp(b) > > > > fit <- glmFit(b,design,dispersion=disp$common.dispersion) > > > > lrt <- glmLRT(b,fit,coef=fit$design) > > disp$common.dispersion = 0.0001004979 > > > > All proteins (3430) had a p.value of 0. > > > > I tried also with > > > > fit <- glmFit(b,design,dispersion=disp$common.dispersion) > > > > lrt <- glmLRT(b,fit,coef=fit$design) > > disp$common.dispersion = 3.999943 > > > > and that gave me all the proteins with p.value lower than 6.29E-05. > > > > That gave a signal that I'm doing something wrong or because of both common > > dispersions my data is not a appropriate for the analysis. > > > > Any suggestions or corrections? > > > > -- > > Fabricio K. Marchini > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY

Login before adding your answer.

Traffic: 514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6