Entering edit mode
Halian Vilela
▴
50
@halian-vilela-4954
Last seen 10.2 years ago
Dear Dr. Zhu,
I'm trying to use your bioconductor package to perform some analysis,
and
I'm running through a problem which I'm very doubtful on dealing with.
I'll try to be very concise here.
I have two data sets (DS1 and DS2) in which I want to run
annotatePeakInBatch against the same AnnotationData (AD).
All of them are already RangedData, so all I need to do is run the
function
normaly annotatePeakInBatch(DS1,AnnotationData=AD)
DS1 has exactly 12.263 entries and running system.time() over the
call, it
yielded me this:
user system elapsed
126.98 0.13 127.93
That's OK, the function worked flawlessly, the problem arise with the
second dataset (DS2). It's huge, very big. It's a dataset of short
reads
with 9.696.611 entries (yeah, almost 10million reads)
I ran it against the same AnnotationData and it's been running for
more
than 24 hours right now. The question is... is that normal ?
Should I really expect such a long time to do all the calculations ?
Could
you please elucidate something about the complexity of the algorithm
being
used ?
I would be very pleased if it's also possible to present me some
benchmarks
you'd done when developing the package, this run is just one of many
others
that I need to do and it would be great to have some information on
how
long it'll last, so I can build my work schedule properly.
The size of the database AD is 86.046 entries, the machine that is
running
this is a server with 4 quad-cores and 22GB of RAM which 40% are being
used
by R running the function.
Thanks,
Halian
[[alternative HTML version deleted]]