Advice with edgeR

0

Entering edit mode

Paul Leo ▴ 970

@paul-leo-2092

Last seen 9.7 years ago

I've just starting exploring edgeR. Gotten an error I am unsure of... I'm following the how-to with no changes > d DGEList: $data lane3 lane8 lane5 lane6 1 0 0 1054 480 2 32 0 470 744 3 0 0 675 419 4 0 0 510 734 5 107 0 180 1138 81702 more rows ... $lib.size [1] 6482601 808363 7217381 7081739 $group [1] ISOTYPE ISOTYPE CASE CASE Levels: ISOTYPE CASE > alpha EBList: $alpha [1] 3.485649e-09 $common.dispersion [1] 2.548059 d contains no columns where all counts are zero but there are a lot of tags.... > sum(apply(d$data,1,sum)==0) [1] 0 running the following: chosen.lanes<-c("lane5","lane6","lane3","lane8") considered<-rownames(the.counts)[apply(the.counts[,chosen.lanes],1,sum )! =0 d<-DGEList(data = the.counts[considered,chosen.lanes], group = c("CASE","CASE","ISOTYPE","ISOTYPE"),lib.size = total.counts[chosen.lanes,"count"]) alpha<-alpha.approxeb(d) alpha ms<-deDGE(d,alpha=alpha$alpha) ##ERROR Calculating shrinkage overdispersion parameters. [quantileAdjust] Iteration (dot=1000) 1 :Error in approx(c(0, a + c(diff(a)/2, 0)), c(-0.5, 0:mx), xout = p[i, : approx(): attempted to interpolate NA values The error does not occur if I set doPoisson (yet) and I have a case where probably 1/2 of the "tags" are not differentially expressed. Any insight would be very welcome. Thanks Paul > sessionInfo() R version 2.10.0 Under development (unstable) (2009-07-24 r48986) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_1.3.3 Biobase_2.5.4 loaded via a namespace (and not attached): [1] tcltk_2.10.0 tools_2.10.0

edgeR edgeR • 954 views

ADD COMMENT • link 14.8 years ago Paul Leo ▴ 970

0

Entering edit mode

Paul Leo ▴ 970

@paul-leo-2092

Last seen 9.7 years ago

HI Mark , Yes that is very helpful, additionally I have found that edgeR does NOT work in my hands with R2.10 and Bioc 2.5 (generated that error) see below for sessionInfo But the SAME code DOES work for R2.9.1 Bioc2.4 > sessionInfo() R version 2.9.1 (2009-06-26) x86_64-pc-linux-gnu locale: LC_CTYPE=en_AU.UTF-8;LC_NUMERIC=C;LC_TIME=en_AU.UTF-8;LC_COLLATE=en_AU .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_AU.UTF-8;LC_PAPER=en_AU.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_AU.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_1.2.4 Biobase_2.4.1 loaded via a namespace (and not attached): [1] tcltk_2.9.1 tools_2.9.1 I Have not tried the svn version of edgeR with R2.10 Yes lane8 is a problem I am half thinking to killing it or try doing a recall of the bases with a model approach. The way this has been set up is that there are 8100K regions from a chipseq experiment that contains technical and biological replicates and a lots of potentially bad regions have been included to cast a broad net- maybe too broad (love to explicity include the effect of the technical replicate). I could kill most of these additional regions off.. and check the common dispersion? Can you share with me a rough feel for what is high/low group dispersion ? I have no feel at the early stage what is out of the ball-park and what is safe? ALSO an additional question, if you permit... On a conceptual level I was wondering if it makes sent to put count data on a pedestal - for the explicit case of performing differential expression, in an analogous way that intensities in microarray data can be put of a pedestal (eg to make the fold changes look more uniform when intensities are low). In this instance the library size would confound that as, I think, adding the same constant to libraries of different sizes would not be equivalent. Have you found that putting the count data on a pedestal is at all helpful or is that appropriate for this analysis (again just for differential expression)? I imagine that the quantileAjust could be used to get pseudodata on the same library size and then the constant could be added? Excuse my wild conjecturing... Thanks again Paul -----Original Message----- From: Mark Robinson <mrobinson@wehi.edu.au> To: Paul Leo <p.leo at="" uq.edu.au=""> Subject: Re: [BioC] Advice with edgeR Date: Tue, 4 Aug 2009 15:28:42 +1000 Hi Paul. Thanks for trying edgeR and giving your feedback. I have seen this error before in certain situations. There is a quantile adjustment step in there for the NB distribution and in difficult tails of the NB distribution, the quantile functions don't always return sensible results. Ideally, we should have the code deal more gracefully for these. I suspect the major factor for your data is the fact that lane8 has about 10-fold less total sequence than the rest of the lanes. This probably pushes the limits of the quantile adjustment. Is there a reason for this lane have so few reads? I wonder if things would be better if you excluded that sample. Another factor that may affect this is the variability. The common dispersion seems quite high, although that may be simply related to the depth of sequencing you have. Not sure I've been very insightful, but at least a few thoughts ... Cheers, Mark On 03/08/2009, at 6:06 PM, Paul Leo wrote: > I've just starting exploring edgeR. Gotten an error I am unsure of... > > I'm following the how-to with no changes > >> d > DGEList: > $data > lane3 lane8 lane5 lane6 > 1 0 0 1054 480 > 2 32 0 470 744 > 3 0 0 675 419 > 4 0 0 510 734 > 5 107 0 180 1138 > 81702 more rows ... > > $lib.size > [1] 6482601 808363 7217381 7081739 > > $group > [1] ISOTYPE ISOTYPE CASE CASE > Levels: ISOTYPE CASE > >> alpha > EBList: > $alpha > [1] 3.485649e-09 > > $common.dispersion > [1] 2.548059 > > d contains no columns where all counts are zero but there are a lot of > tags.... >> sum(apply(d$data,1,sum)==0) > [1] 0 > running the following: > > > chosen.lanes<-c("lane5","lane6","lane3","lane8") > considered<-rownames(the.counts)[apply(the.counts[,chosen.lanes], > 1,sum)! > =0 > d<-DGEList(data = the.counts[considered,chosen.lanes], group = > c("CASE","CASE","ISOTYPE","ISOTYPE"),lib.size = > total.counts[chosen.lanes,"count"]) > alpha<-alpha.approxeb(d) > alpha > ms<-deDGE(d,alpha=alpha$alpha) > > ##ERROR > Calculating shrinkage overdispersion parameters. > [quantileAdjust] Iteration (dot=1000) 1 :Error in approx(c(0, a + > c(diff(a)/2, 0)), c(-0.5, 0:mx), xout = p[i, : > approx(): attempted to interpolate NA values > > The error does not occur if I set doPoisson (yet) and I have a case > where probably 1/2 of the "tags" are not differentially expressed. > > Any insight would be very welcome. > > > Thanks > Paul > > > >> sessionInfo() > R version 2.10.0 Under development (unstable) (2009-07-24 r48986) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_AU.UTF-8 > [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] edgeR_1.3.3 Biobase_2.5.4 > > loaded via a namespace (and not attached): > [1] tcltk_2.10.0 tools_2.10.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robinson at garvan.org.au e: mrobinson at wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852

ADD COMMENT • link 14.8 years ago Paul Leo ▴ 970

Login before adding your answer.