Question: CRISPRseek help to generate gRNAs for CRISPRi library
0
gravatar for Lucka387
15 days ago by
Lucka3870
Lucka3870 wrote:

I am currently in the process of designing a gRNA library for a CRISPRi screen. I would like to use the CRISPRseek script to identify the top 10 most efficient gRNAs per gene for my library. I ran a test on a small file (128 KB) using the script for Scenario #7: Quick gRNA finding with gRNA efficacy prediction:

Scenario 7. Quick gRNA finding with gRNA efficacy prediction

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE, findPairedgRNAOnly = TRUE, chromToSearch = "all", max.mismatch = 0, BSgenomeName = Hsapiens, outputDir = outputDir, overwrite = TRUE)

This took ~36hrs to run on one core and used a lot of memory. I’ll have a minimum of 24.9MB of input to run, with more data on the way.

My question is if there is a way to edit the script to minimize the time and memory to generate the information I need? I do not need any information on restriction cut sites because CRISPRi does not cut the DNA and I do not necessarily need paired gRNAs unless that is the fastest way for the script to run. From the output of my test file I saw there are many gRNAs generated but if I only need the 10 most efficient per gene- is there a way to select for these to make the script run faster with less memory?

Any help or advice would be much appreciated!

Thank you, Kathleen

crisprseek • 70 views
ADD COMMENTlink modified 15 days ago by Julie Zhu4.1k • written 15 days ago by Lucka3870
Answer: CRISPRseek help to generate gRNAs for CRISPRi library
0
gravatar for Julie Zhu
15 days ago by
Julie Zhu4.1k
United States
Julie Zhu4.1k wrote:

Hi Katherine,

I suggest set exportAllgRNAs = "fasta" and annotatePaired = FALSE in addition to the parameters you set, such as

findPairedgRNAOnly = FALSE,

findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE,

If you are not interested in identifying offTargets for each gRNA, you can set chromToSearch = "" to make it run much faster.

If you need to search for offTargets, I suggest you first run the analysis without searching for offTargets with the above setting, then select gRNAs with reasonable efficiency and run offTarget analysis for the selected gRNAs (section 2.5, 2.9 and 2.10). If you have access to high performance computing clusters (HPCC), I can share my scripts for you to run the searches in multiple nodes.

FYI, the most recent version of CRISPRseek implements three different algorithms for calculating gRNA efficiency. Please read section 2.7 for details. Thanks!

http://bioconductor.org/packages/devel/bioc/vignettes/CRISPRseek/inst/doc/CRISPRseek.pdf#page8

Best regards,

Julie

On Nov 28, 2019, at 4:22 AM, Lucka387 [bioc] <noreply@bioconductor.org<a rel="nofollow" href="mailto:noreply@bioconductor.org">noreply@bioconductor.org> wrote:

Activity on a post you are following on support.bioconductor.orghttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.bioconductor.org&data=02%7C01%7Cjulie.zhu%40umassmed.edu%7Cacab61bf97f54bc7d5c008d773e46d42%7Cee9155fe2da34378a6c44405faf57b2e%7C0%7C0%7C637105297244413553&sdata=KoQTU5S5cDLlWIDbwZQ1R256xfg7kWJgAB%2Bzg5CsxoI%3D&reserved=0

User Lucka387https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.bioconductor.org%2Fu%2F22466%2F&data=02%7C01%7Cjulie.zhu%40umassmed.edu%7Cacab61bf97f54bc7d5c008d773e46d42%7Cee9155fe2da34378a6c44405faf57b2e%7C0%7C0%7C637105297244423552&sdata=Tqlql%2BC8ztTc5Rykph%2BV5jcJRx4q4rUUpCQ13%2FJCHnE%3D&reserved=0 wrote Question: CRISPRseek help to generate gRNAs for CRISPRi libraryhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.bioconductor.org%2Fp%2F126760%2F&data=02%7C01%7Cjulie.zhu%40umassmed.edu%7Cacab61bf97f54bc7d5c008d773e46d42%7Cee9155fe2da34378a6c44405faf57b2e%7C0%7C0%7C637105297244423552&sdata=n13ciBWdg%2FaeWkhYXGStbguv76r1Y9RnVUwAHWYe%2B3Y%3D&reserved=0:

I am currently in the process of designing a gRNA library for a CRISPRi screen. I would like to use the CRISPRseek script to identify the top 10 most efficient gRNAs per gene for my library. I ran a test on a small file (128 KB) using the script for Scenario #7: Quick gRNA finding with gRNA efficacy prediction:

Scenario 7. Quick gRNA finding with gRNA efficacy prediction

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE, findPairedgRNAOnly = TRUE, chromToSearch = "all", max.mismatch = 0, BSgenomeName = Hsapiens, outputDir = outputDir, overwrite = TRUE)

This took ~36hrs to run on one core and used a lot of memory. I’ll have a minimum of 24.9MB of input to run, with more data on the way.

My question is if there is a way to edit the script to minimize the time and memory to generate the information I need? I do not need any information on restriction cut sites because CRISPRi does not cut the DNA and I do not necessarily need paired gRNAs unless that is the fastest way for the script to run. From the output of my test file I saw there are many gRNAs generated but if I only need the 10 most efficient per gene- is there a way to select for these to make the script run faster with less memory?

Any help or advice would be much appreciated!

Thank you, Kathleen

ADD COMMENTlink written 15 days ago by Julie Zhu4.1k

Thank you very much for the prompt reply. I will rerun with the changed parameters as suggested.

The other issue, which I mentioned, is that it is only using one core, which is odd considering the enable.multicore is set to true. Our system (just one server, not a cluster) has 144 available threads and 1TB RAM. I did find a post elsewhere saying there might be an issue with processing on multicores if there is more than 128 connections support.bioconductor.org/p/72994 and 9th answer down), so am wondering if this is the issue preventing it from running on more than one core. Thoughts? Thank you for your time.

ADD REPLYlink written 15 days ago by Lucka3870

You are welcome, Katherine!

I suggest to set n.cores = 6 to test whether it works as expected.

If you have a file with lots of gRNAs to search for offTargets, it will be more effective to run several searches each with a subset of the gRNAs.

Best regards,

Julie

ADD REPLYlink written 14 days ago by Julie Zhu4.1k

Hi Julie,

I adjusted my script based on your suggestions, and it worked well for my output but I am still unable to run on multiple cores. This is what I ran:

Scenario 5: Target and off-target analysis for user specified gRNAs

results <- offTargetAnalysis(inputFilePath = gRNAFilePath, enable.multicore = TRUE, n.cores.max = 6, annotateExon = FALSE, findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE, findgRNAs = FALSE, BSgenomeName = Hsapiens, chromToSearch = "all", txdb = TxDb.Hsapiens.UCSC.hg38.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 0, outputDir = outputDir, overwrite = TRUE)

You previously mentioned "If you have access to high performance computing clusters (HPCC), I can share my scripts for you to run the searches in multiple nodes." I do have access to HPCC, is there an additional script I should run for multiple core use other than "enable.multicore = TRUE, n.cores.max = 6"?

Thank you! Kathleen

ADD REPLYlink written 12 days ago by Lucka3870

Hi Kathleen,

Here are the scripts offTargetSearchBatch.R and offTargetSearchBatch.bsub, which I used for batch analysis in high-performance computing environment.

After modifying the parameters to fit your own needs, you can run the script in the cluster by typing the following command.

./offTargetSearchBatch.bsub

Hope it helps.

Best,

Julie

#offTargetSearchBatch.bsub

This is the example that submits 51 jobs to the cluster, please change 51 to a larger number if you have more than 51000 gRNAs to search for offtargets

Please change R_LIBS, workingDir, R path and offTargetSearchBatch.R path accordingly

for FILE in {1..51}; do #BASENAME=basename $FILE BASENAME=$FILE SHF=$BASENAME.bsub DIR=$BASENAME.output mkdir -p $DIR echo "Processing $FILE ..." echo "#!/bin/bash" > $SHF #echo "module load R/3.1.0" >>$SHF echo "export RLIBS=/project/umwmccb/R/R-3.4.0/lib64/R/library:/share/pkg/R/3.4.0/lib64/R/library:/home/jz57w/R/x86_64-pc-linux-gnu-library/3.4" >> $SHF echo "#BSUB -J $BASENAME" >>$SHF echo "workingDir=~/mccb/Zhu/CRISPR" >>$SHF workingDir=~/mccb/Zhu/CRISPR echo "cd $workingDir" >>$SHF echo "#BSUB -q long" >> $SHF echo "#BSUB -R rusage[mem=20000]" >> $SHF echo "#BSUB -W 48:00" >>$SHF echo "#BSUB -o out.$BASENAME.log" >>$SHF echo "#BSUB -e err.$BASENAME.log" >>$SHF echo "~/mccb/bin/R CMD BATCH --no-save --no-restore '--args $BASENAME' ~/mccb/Zhu/CRISPR/offTargetSearchBatch.R $SHF.log" >> $SHF bsub <$SHF sleep 20 done

## offTargetSearchBatch.R
Search for offTargets for 1000 gRNAs at a time
Allow maximum 3 mismatches, please change it accordingly
Please change the BSgenome, Txdb, org, PAM sequence accordingly
Rule set 2 and CRISPRscan have been implemented since this implementation,
please type help(offTargetAnalysis) to set rule.set and other parameters accordingly

library("CRISPRseek")

library("BSgenome.Hsapiens.UCSC.hg19")

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

library(org.Hs.eg.db)

args=commandArgs(trailingOnly = TRUE)

gRNAs <- readDNAStringSet("~/mccb/Zhu/CRISPR/inputSeqallgRNAs.fa")

batch.ind = as.numeric(args[1]) - 1

batch.ind

batch.start <- max(batch.ind * 1000 + 1, 1)

batch.end <- min((batch.ind + 1) * 1000, length(gRNAs))

if (batch.end >= batch.start)

{

inputFilePath <- gRNAs[batch.start:batch.end]



setwd("~/mccb/Zhu/CRISPR/")

outputDir <- paste("~/mccb/Zhu/CRISPR/output", batch.ind, sep="")

results <- offTargetAnalysis(inputFilePath,

   findgRNAsWithREcutOnly = FALSE,

   findPairedgRNAOnly = FALSE,

   gRNAoutputName = paste("gRNAs", batch.ind, sep=""),

   PAM = "NNN",

   annotatePaired = FALSE,

   findgRNAs = FALSE,

   BSgenomeName = Hsapiens,

   annotateExon = FALSE,

   exportAllgRNAs =  "fasta",

   scoring.method = "CFDscore",

   fetchSequence = FALSE,

   txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,

   orgAnn = org.Hs.egSYMBOL, max.mismatch = 3,

   outputDir = outputDir, overwrite = TRUE)

}

ADD REPLYlink written 11 days ago by Julie Zhu4.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 144 users visited in the last hour