CRISPRseek: searchHits() optimization
Herve, Thanks so much! Yes, with a sequence containing 454 gRNAs, it took 34 minutes to perform gRNA search, restriction enzyme and paired configuration annotation, and off-target search and score prediction in human genome. It is a huge increase in speed gain (> 10x) ! I notice that genome-wide search includes searching in contigs. I recall that someone in Bioc2014 mentioned a function that can return a main set of chromosomes for a given BSgenome, but I do not remember the function any more. Do you or anyone in the list knows the function? Many thanks! Best regards, Julie On 8/3/14 7:54 AM, "Lihua Julie Zhu" <julie.zhu at=""""> wrote: > Herve, > > Wow! Thanks so much for improving the code so quickly! > > I will play with it today. > > Best regards, > > Julie > > > On 8/3/14 4:46 AM, "Hervé Pagès" <hpages at=""""> wrote: > >> Hi Julie, >> >> >> I looked at the searchHits() function and found a way to optimize it. >> The trick is to use matchPDict() internally instead of matchPattern() >> and to preprocess the set of gRNAs. This allows a 2x speedup for 50 >> 23-base gRNAs with max.mismatch=4. The speedup will be more drastic >> if there are more gRNAs or if they are longer. For example, with >> hundreds of 23-base gRNAs, you will probably see a 4x speedup and >> even more if the gRNAs are longer. >> Note that preprocessing is not always possible e.g. if the gRNAs >> are very short, or if max.mismatch is too high, or if the gRNAs >> contain IUPAC ambiguity codes. In that case, the code will skip >> the preprocessing step and you won't see any speedup. >> >> I committed the change to the devel version of CRISPRseek and bumped >> the version to 1.1.8. Try it with a big set of gRNAs and let me >> know how it goes. If you use max.mismatch=4, the longer the gRNAs >> are, the faster it's going to be. Let me know if you run into any >> problem. >> >> I enjoyed the conference. It was nice to see you a again. >> Hope you had a safe trip back home. >> >> Best, >> H. >
