Question: Running HTSFilter in paralell
0
3.7 years ago by
Richard Yanicky10 wrote:

Hello,

I am using the HTSFilter library to filter out low count samples for some RNA data. It is working but takes a while to run. Is there a way to run it using multiple cores/cpu's?

Regards,

Richard

htsfilter parallel • 542 views
modified 3.7 years ago by andrea.rau60 • written 3.7 years ago by Richard Yanicky10
1
3.7 years ago by
andrea.rau60
INRA / Jouy en Josas, France
andrea.rau60 wrote:

Hi Richard,

If HTSFilter is taking a while to run, I'm guessing that it's because you have a fairly large number of samples -- right? The method in HTSFilter is extremely parallelizable since for a given filtering threshold, the Jaccard similarity index is calculated in a loop for all possible pairs of replicates and then averaged (which means calculations could be done in parallel both for different pairs of samples and for different filtering thresholds).

That being said, unfortunately I haven't yet included the ability to run HTSFilter over multiple cores/cpu's since most of my use cases to date have had a limited number of replicate samples (say, less than 10 or say). However, if this is an option you're interested in, I could take a look at including it (although it may take me a bit of time since I need to familiarize myself with the necessary packages). Let me know!

Regards,

Andrea

Hi Andrea,

Thanks for the response!

Yes we do have a large number of samples and hope to setup a pipeline using HTSFilter. We have used sorter s.len to speed it up but need to be sure the results are robust. If there was a multicore option it would be a great help.

Thanks,

Richard

1
3.7 years ago by
andrea.rau60
INRA / Jouy en Josas, France
andrea.rau60 wrote:

Ok, I will work on adding the possibility of parallel calculations to HTSFilter. It may take me a couple of weeks to get around to it, but I will let you know when it is ready for testing in the development version. Thanks again for the feedback!

Best,

Andrea

After a longer delay than expected (my apologies!), HTSFilter now implements (as of Bioconductor 3.4, version 1.14.0) the option for parallel calculations through the BiocParallel package. There are now two additional optional arguments in calls to HTSFilter: parallel (TRUE/FALSE) and BPPARAM to specify the backend for parallel execution. I hope this helps the execution time for your use case! Any feedback is welcome.

Best,

Andrea