Question: CRISPRseek help to generate gRNAs for CRISPRi library
0
15 days ago by
Lucka3870
Lucka3870 wrote:

I am currently in the process of designing a gRNA library for a CRISPRi screen. I would like to use the CRISPRseek script to identify the top 10 most efficient gRNAs per gene for my library. I ran a test on a small file (128 KB) using the script for Scenario #7: Quick gRNA finding with gRNA efficacy prediction:

## Scenario 7. Quick gRNA finding with gRNA efficacy prediction

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE, findPairedgRNAOnly = TRUE, chromToSearch = "all", max.mismatch = 0, BSgenomeName = Hsapiens, outputDir = outputDir, overwrite = TRUE)

This took ~36hrs to run on one core and used a lot of memory. I’ll have a minimum of 24.9MB of input to run, with more data on the way.

My question is if there is a way to edit the script to minimize the time and memory to generate the information I need? I do not need any information on restriction cut sites because CRISPRi does not cut the DNA and I do not necessarily need paired gRNAs unless that is the fastest way for the script to run. From the output of my test file I saw there are many gRNAs generated but if I only need the 10 most efficient per gene- is there a way to select for these to make the script run faster with less memory?

Any help or advice would be much appreciated!

Thank you, Kathleen

crisprseek • 70 views
modified 15 days ago by Julie Zhu4.1k • written 15 days ago by Lucka3870
Answer: CRISPRseek help to generate gRNAs for CRISPRi library
0
15 days ago by
Julie Zhu4.1k
United States
Julie Zhu4.1k wrote:

Hi Katherine,

I suggest set exportAllgRNAs = "fasta" and annotatePaired = FALSE in addition to the parameters you set, such as

findPairedgRNAOnly = FALSE,

findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE,

If you are not interested in identifying offTargets for each gRNA, you can set chromToSearch = "" to make it run much faster.

If you need to search for offTargets, I suggest you first run the analysis without searching for offTargets with the above setting, then select gRNAs with reasonable efficiency and run offTarget analysis for the selected gRNAs (section 2.5, 2.9 and 2.10). If you have access to high performance computing clusters (HPCC), I can share my scripts for you to run the searches in multiple nodes.

FYI, the most recent version of CRISPRseek implements three different algorithms for calculating gRNA efficiency. Please read section 2.7 for details. Thanks!

http://bioconductor.org/packages/devel/bioc/vignettes/CRISPRseek/inst/doc/CRISPRseek.pdf#page8

Best regards,

Julie

On Nov 28, 2019, at 4:22 AM, Lucka387 [bioc] <noreply@bioconductor.org<a rel="nofollow" href="mailto:noreply@bioconductor.org">noreply@bioconductor.org> wrote:

I am currently in the process of designing a gRNA library for a CRISPRi screen. I would like to use the CRISPRseek script to identify the top 10 most efficient gRNAs per gene for my library. I ran a test on a small file (128 KB) using the script for Scenario #7: Quick gRNA finding with gRNA efficacy prediction:

Scenario 7. Quick gRNA finding with gRNA efficacy prediction

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE, findPairedgRNAOnly = TRUE, chromToSearch = "all", max.mismatch = 0, BSgenomeName = Hsapiens, outputDir = outputDir, overwrite = TRUE)

This took ~36hrs to run on one core and used a lot of memory. I’ll have a minimum of 24.9MB of input to run, with more data on the way.

My question is if there is a way to edit the script to minimize the time and memory to generate the information I need? I do not need any information on restriction cut sites because CRISPRi does not cut the DNA and I do not necessarily need paired gRNAs unless that is the fastest way for the script to run. From the output of my test file I saw there are many gRNAs generated but if I only need the 10 most efficient per gene- is there a way to select for these to make the script run faster with less memory?

Any help or advice would be much appreciated!

Thank you, Kathleen

Thank you very much for the prompt reply. I will rerun with the changed parameters as suggested.

The other issue, which I mentioned, is that it is only using one core, which is odd considering the enable.multicore is set to true. Our system (just one server, not a cluster) has 144 available threads and 1TB RAM. I did find a post elsewhere saying there might be an issue with processing on multicores if there is more than 128 connections support.bioconductor.org/p/72994 and 9th answer down), so am wondering if this is the issue preventing it from running on more than one core. Thoughts? Thank you for your time.

You are welcome, Katherine!

I suggest to set n.cores = 6 to test whether it works as expected.

If you have a file with lots of gRNAs to search for offTargets, it will be more effective to run several searches each with a subset of the gRNAs.

Best regards,

Julie

Hi Julie,

I adjusted my script based on your suggestions, and it worked well for my output but I am still unable to run on multiple cores. This is what I ran:

## Scenario 5: Target and off-target analysis for user specified gRNAs

results <- offTargetAnalysis(inputFilePath = gRNAFilePath, enable.multicore = TRUE, n.cores.max = 6, annotateExon = FALSE, findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE, findgRNAs = FALSE, BSgenomeName = Hsapiens, chromToSearch = "all", txdb = TxDb.Hsapiens.UCSC.hg38.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 0, outputDir = outputDir, overwrite = TRUE)

You previously mentioned "If you have access to high performance computing clusters (HPCC), I can share my scripts for you to run the searches in multiple nodes." I do have access to HPCC, is there an additional script I should run for multiple core use other than "enable.multicore = TRUE, n.cores.max = 6"?

Thank you! Kathleen

Hi Kathleen,

Here are the scripts offTargetSearchBatch.R and offTargetSearchBatch.bsub, which I used for batch analysis in high-performance computing environment.

After modifying the parameters to fit your own needs, you can run the script in the cluster by typing the following command.

./offTargetSearchBatch.bsub

Hope it helps.

Best,

Julie

#### Please change R_LIBS, workingDir, R path and offTargetSearchBatch.R path accordingly

for FILE in {1..51}; do #BASENAME=basename $FILE BASENAME=$FILE SHF=$BASENAME.bsub DIR=$BASENAME.output mkdir -p $DIR echo "Processing$FILE ..." echo "#!/bin/bash" > $SHF #echo "module load R/3.1.0" >>$SHF echo "export RLIBS=/project/umwmccb/R/R-3.4.0/lib64/R/library:/share/pkg/R/3.4.0/lib64/R/library:/home/jz57w/R/x86_64-pc-linux-gnu-library/3.4" >> $SHF echo "#BSUB -J$BASENAME" >>$SHF echo "workingDir=~/mccb/Zhu/CRISPR" >>$SHF workingDir=~/mccb/Zhu/CRISPR echo "cd $workingDir" >>$SHF echo "#BSUB -q long" >> $SHF echo "#BSUB -R rusage[mem=20000]" >>$SHF echo "#BSUB -W 48:00" >>$SHF echo "#BSUB -o out.$BASENAME.log" >>$SHF echo "#BSUB -e err.$BASENAME.log" >>$SHF echo "~/mccb/bin/R CMD BATCH --no-save --no-restore '--args$BASENAME' ~/mccb/Zhu/CRISPR/offTargetSearchBatch.R $SHF.log" >>$SHF bsub <\$SHF sleep 20 done

##### please type help(offTargetAnalysis) to set rule.set and other parameters accordingly

library("CRISPRseek")

library("BSgenome.Hsapiens.UCSC.hg19")

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

library(org.Hs.eg.db)

args=commandArgs(trailingOnly = TRUE)

batch.ind = as.numeric(args[1]) - 1

batch.ind

batch.start <- max(batch.ind * 1000 + 1, 1)

batch.end <- min((batch.ind + 1) * 1000, length(gRNAs))

if (batch.end >= batch.start)

{

inputFilePath <- gRNAs[batch.start:batch.end]

setwd("~/mccb/Zhu/CRISPR/")

outputDir <- paste("~/mccb/Zhu/CRISPR/output", batch.ind, sep="")

results <- offTargetAnalysis(inputFilePath,

findgRNAsWithREcutOnly = FALSE,

findPairedgRNAOnly = FALSE,

gRNAoutputName = paste("gRNAs", batch.ind, sep=""),

PAM = "NNN",

annotatePaired = FALSE,

findgRNAs = FALSE,

BSgenomeName = Hsapiens,

annotateExon = FALSE,

exportAllgRNAs =  "fasta",

scoring.method = "CFDscore",

fetchSequence = FALSE,

txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,

orgAnn = org.Hs.egSYMBOL, max.mismatch = 3,

outputDir = outputDir, overwrite = TRUE)


}