Question

[ChIPpeakAnno] - What's the "annotatePeakInBatch" function expected running time ?

0

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 13 months ago

United States

Halian, Are you running annotatePeakInBatch using the aligned reads instead of peaks called from one of the peak calling algorithms? This function is designed for annotating peaks which should be much smaller than 10 million. Thanks! Best regards, Julie On 11/9/11 6:37 PM, "Halian Vilela" <halianlian@gmail.com> wrote: Dear Dr. Zhu, I'm trying to use your bioconductor package to perform some analysis, and I'm running through a problem which I'm very doubtful on dealing with. I'll try to be very concise here. I have two data sets (DS1 and DS2) in which I want to run annotatePeakInBatch against the same AnnotationData (AD). All of them are already RangedData, so all I need to do is run the function normaly annotatePeakInBatch(DS1,AnnotationData=AD) DS1 has exactly 12.263 entries and running system.time() over the call, it yielded me this: user system elapsed 126.98 0.13 127.93 That's OK, the function worked flawlessly, the problem arise with the second dataset (DS2). It's huge, very big. It's a dataset of short reads with 9.696.611 entries (yeah, almost 10million reads) I ran it against the same AnnotationData and it's been running for more than 24 hours right now. The question is... is that normal ? Should I really expect such a long time to do all the calculations ? Could you please elucidate something about the complexity of the algorithm being used ? I would be very pleased if it's also possible to present me some benchmarks you'd done when developing the package, this run is just one of many others that I need to do and it would be great to have some information on how long it'll last, so I can build my work schedule properly. The size of the database AD is 86.046 entries, the machine that is running this is a server with 4 quad-cores and 22GB of RAM which 40% are being used by R running the function. Thanks, Halian [[alternative HTML version deleted]]

AnnotationData AnnotationData • 1.3k views

ADD COMMENT • link updated 13.1 years ago by Halian Vilela ▴ 50 • written 13.1 years ago by Julie Zhu ★ 4.3k

score 0 · Answer 1 · 2011-11-10

0

Entering edit mode

Halian Vilela ▴ 50

@halian-vilela-4954

Last seen 10.3 years ago

Julie, thanks for the fast answer. Yes, I was using the reads directly. Surely I misunderstood the kind of data the function should work with. What peak calling algorithm should I use then ? Are there any function to do this in your package ? If not, could you suggest me any (package) ? Thanks a lot. Halian [[alternative HTML version deleted]]

ADD COMMENT • link 13.1 years ago Halian Vilela ▴ 50

0

Entering edit mode

Halian, For functionalities of ChIPpeakAnno, please refer to http://www.bioconductor.org/help/course- materials/2011/BioC2011/LabStuff/ChI PpeakAnno-BioC2011.pdf and Zhu LJ, Gazin C, Lawson ND, Pages H, Lin SM, Lapointe DS and Green MR. (2010) ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010, 11:237 which also gives an overview of Bioconductor packages for ChIP-seq data analysis There are several peak calling algorithms available in Bioconductor http://bioconductor.org/packages/release/bioc/. In addition, there are other peak calling algorithms outside of Bioconductor Evaluation of Algorithm Performance in ChIP-Seq Peak Detection (Elizabeth et al., 2010.PLoS one 5(7):e11471) Laajala et al., 2009 A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments. BMC Bioinformatics 10:618 doi:10.1186/1471-2164-10-618 Best regards, Julie On 11/10/11 12:56 PM, "Halian Vilela" <halianlian at="" gmail.com=""> wrote: > Julie, thanks for the fast answer. > > Yes, I was using the reads directly. Surely I misunderstood the kind of data > the function should work with. > > What peak calling algorithm should I use then ? Are there any function to do > this in your package ? If not, could you suggest me any (package) ? > > Thanks a lot. > Halian >

ADD REPLY • link 13.1 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Halian, Here is a presentation by Xuekui Zhang,Eloi Mercier and Arnaud Droit on analyzing ChIP-seq dataset using Bioconductor packages at the Bioc2011 Seattle meeting. http://www.bioconductor.org/help/course- materials/2011/BioC2011/LabStuff/Bio c2011_Arnaud.pdf Hope this is what you are looking for. Best regards, Julie On 11/10/11 1:47 PM, "Julie Zhu" <julie.zhu at="" umassmed.edu=""> wrote: > Halian, > > For functionalities of ChIPpeakAnno, please refer to > http://www.bioconductor.org/help/course- materials/2011/BioC2011/LabStuff/ChI > PpeakAnno-BioC2011.pdf and > Zhu LJ, Gazin C, Lawson ND, Pages H, Lin SM, Lapointe DS and Green MR. > (2010) ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and > ChIP-chip data. BMC Bioinformatics 2010, 11:237 which also gives an overview > of Bioconductor packages for ChIP-seq data analysis > > There are several peak calling algorithms available in Bioconductor > http://bioconductor.org/packages/release/bioc/. In addition, there are other > peak calling algorithms outside of Bioconductor > Evaluation of Algorithm Performance in ChIP-Seq Peak Detection (Elizabeth et > al., 2010.PLoS one 5(7):e11471) > Laajala et al., 2009 A practical comparison of methods for detecting > transcription factor binding sites in ChIP-seq experiments. BMC > Bioinformatics 10:618 doi:10.1186/1471-2164-10-618 > > Best regards, > > Julie > > > On 11/10/11 12:56 PM, "Halian Vilela" <halianlian at="" gmail.com=""> wrote: > >> Julie, thanks for the fast answer. >> >> Yes, I was using the reads directly. Surely I misunderstood the kind of data >> the function should work with. >> >> What peak calling algorithm should I use then ? Are there any function to do >> this in your package ? If not, could you suggest me any (package) ? >> >> Thanks a lot. >> Halian >> >

ADD REPLY • link 13.1 years ago Julie Zhu ★ 4.3k