Entering edit mode
Halian,
Are you running annotatePeakInBatch using the aligned reads instead of
peaks called from one of the peak calling algorithms? This function is
designed for annotating peaks which should be much smaller than 10
million. Thanks!
Best regards,
Julie
On 11/9/11 6:37 PM, "Halian Vilela" <halianlian@gmail.com> wrote:
Dear Dr. Zhu,
I'm trying to use your bioconductor package to perform some analysis,
and I'm running through a problem which I'm very doubtful on dealing
with.
I'll try to be very concise here.
I have two data sets (DS1 and DS2) in which I want to run
annotatePeakInBatch against the same AnnotationData (AD).
All of them are already RangedData, so all I need to do is run the
function normaly annotatePeakInBatch(DS1,AnnotationData=AD)
DS1 has exactly 12.263 entries and running system.time() over the
call, it yielded me this:
user system elapsed
126.98 0.13 127.93
That's OK, the function worked flawlessly, the problem arise with the
second dataset (DS2). It's huge, very big. It's a dataset of short
reads with 9.696.611 entries (yeah, almost 10million reads)
I ran it against the same AnnotationData and it's been running for
more than 24 hours right now. The question is... is that normal ?
Should I really expect such a long time to do all the calculations ?
Could you please elucidate something about the complexity of the
algorithm being used ?
I would be very pleased if it's also possible to present me some
benchmarks you'd done when developing the package, this run is just
one of many others that I need to do and it would be great to have
some information on how long it'll last, so I can build my work
schedule properly.
The size of the database AD is 86.046 entries, the machine that is
running this is a server with 4 quad-cores and 22GB of RAM which 40%
are being used by R running the function.
Thanks,
Halian
[[alternative HTML version deleted]]