Entering edit mode
Herve,
Thanks so much! Yes, with a sequence containing 454 gRNAs, it took 34
minutes to perform gRNA search, restriction enzyme and paired
configuration
annotation, and off-target search and score prediction in human
genome. It
is a huge increase in speed gain (> 10x) !
I notice that genome-wide search includes searching in contigs. I
recall
that someone in Bioc2014 mentioned a function that can return a main
set of
chromosomes for a given BSgenome, but I do not remember the function
any
more. Do you or anyone in the list knows the function? Many thanks!
Best regards,
Julie
On 8/3/14 7:54 AM, "Lihua Julie Zhu" <julie.zhu at="" umassmed.edu="">
wrote:
> Herve,
>
> Wow! Thanks so much for improving the code so quickly!
>
> I will play with it today.
>
> Best regards,
>
> Julie
>
>
> On 8/3/14 4:46 AM, "Hervé Pagès" <hpages at="" fhcrc.org=""> wrote:
>
>> Hi Julie,
>>
>>
>> I looked at the searchHits() function and found a way to optimize
it.
>> The trick is to use matchPDict() internally instead of
matchPattern()
>> and to preprocess the set of gRNAs. This allows a 2x speedup for 50
>> 23-base gRNAs with max.mismatch=4. The speedup will be more drastic
>> if there are more gRNAs or if they are longer. For example, with
>> hundreds of 23-base gRNAs, you will probably see a 4x speedup and
>> even more if the gRNAs are longer.
>> Note that preprocessing is not always possible e.g. if the gRNAs
>> are very short, or if max.mismatch is too high, or if the gRNAs
>> contain IUPAC ambiguity codes. In that case, the code will skip
>> the preprocessing step and you won't see any speedup.
>>
>> I committed the change to the devel version of CRISPRseek and
bumped
>> the version to 1.1.8. Try it with a big set of gRNAs and let me
>> know how it goes. If you use max.mismatch=4, the longer the gRNAs
>> are, the faster it's going to be. Let me know if you run into any
>> problem.
>>
>> I enjoyed the conference. It was nice to see you a again.
>> Hope you had a safe trip back home.
>>
>> Best,
>> H.
>