Question

[ChIPpeakAnno] - What's the "annotatePeakInBatch" function expected running time ?

0

Entering edit mode

Halian Vilela ▴ 50

@halian-vilela-4954

Last seen 9.6 years ago

Dear Dr. Zhu, I'm trying to use your bioconductor package to perform some analysis, and I'm running through a problem which I'm very doubtful on dealing with. I'll try to be very concise here. I have two data sets (DS1 and DS2) in which I want to run annotatePeakInBatch against the same AnnotationData (AD). All of them are already RangedData, so all I need to do is run the function normaly annotatePeakInBatch(DS1,AnnotationData=AD) DS1 has exactly 12.263 entries and running system.time() over the call, it yielded me this: user system elapsed 126.98 0.13 127.93 That's OK, the function worked flawlessly, the problem arise with the second dataset (DS2). It's huge, very big. It's a dataset of short reads with 9.696.611 entries (yeah, almost 10million reads) I ran it against the same AnnotationData and it's been running for more than 24 hours right now. The question is... is that normal ? Should I really expect such a long time to do all the calculations ? Could you please elucidate something about the complexity of the algorithm being used ? I would be very pleased if it's also possible to present me some benchmarks you'd done when developing the package, this run is just one of many others that I need to do and it would be great to have some information on how long it'll last, so I can build my work schedule properly. The size of the database AD is 86.046 entries, the machine that is running this is a server with 4 quad-cores and 22GB of RAM which 40% are being used by R running the function. Thanks, Halian [[alternative HTML version deleted]]

AnnotationData AnnotationData • 932 views

ADD COMMENT • link 12.5 years ago Halian Vilela ▴ 50