CRISPRseek scoring clarification
2
0
Entering edit mode
@elspethransom-12442
Last seen 4.1 years ago

I am working with the CRISPRseek R package and I am unsure on the meaning of the off target scores.

Please could you clarify how the off-target score is calculated for the summary output (top 5 off target total score) and whether a higher or lower score is better (in terms of having lower chance of off-target effects).

Many Thanks!

crisprseek • 1.2k views
1
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 6 days ago
United States

Dawid,

FYI, I have updated the CRISPRseek package (Version: 1.17.5) to handle situations when the input gRNAs do not have any perfect match in the searching genome. I tested the package with your example code and gRNAs. Please let me know how it works out for you. Thanks for the feedback!

Best regards,

Julie

0
Entering edit mode

Julie,

Thank you for your help. Is 1.17.5 version available as a devel version on Bioconductor? I can only see 1.17.3 (from 2017-09-05).

Thanks,
Dawid

1
Entering edit mode
0
Entering edit mode

Hi Julie,

Thanks it works!

I noticed that OffTarget file doesn't have columns: inExon, inIntron, entrez_id, gene. Is there any particular reason to skip them? Information coming from these columns is still interesting when you design non-targeting guide (negative/scramble control) to see if potential OffTargets are in exon, intron etc. What do you think?

Best regards,

Dawid

1
Entering edit mode
Dawid, Glad that it works for you. If you set annotateExon = TRUE, txdb and orgAnn, then you should get the annotation information. I removed them for speeding up the test. BTW, I added your example as one of the integration tests in CRISPRseek. Hope it is all right with you. FYI, I changed the value for top1Hit.onTarget.MMdistance2PAM to “perfect match not found” in the summary output when there is no on-target found. I am running the integration tests. I will commit the changes later with version 1.17.6. Best, Julie Best, Julie From: "Dawid G. Nowak [bioc]" <noreply@bioconductor.org> Reply-To: "reply+e3e084e4+code@bioconductor.org" <reply+e3e084e4+code@bioconductor.org> Date: Thursday, October 19, 2017 at 9:54 PM To: "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu> Subject: [bioc] C: CRISPRseek scoring clarification Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Dawid G. Nowak<https: support.bioconductor.org="" u="" 6790=""/> wrote Comment: CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049="" #101873="">: Hi Julie, Thanks it works! I noticed that OffTarget file doesn't have columns: inExon, inIntron, entrez_id, gene. Is there any particular reason to skip them? Information coming from these columns is still interesting when you design non-targeting guide (negative/scramble control) to see if potential OffTargets are in exon, intron etc. What do you think? Best regards, Dawid ________________________________ Post tags: crisprseek You may reply via email or visit C: CRISPRseek scoring clarification
0
Entering edit mode

Dawid

0
Entering edit mode

Hi,

I am still a little confused about the scoring. When you say the lower topN.offtargettotalscore, the better.

I saw people got score from the zhang website(https://zlab.bio/guide-design-resources) which shut down recently. They exclude sgRNAs with score <0.2 . In this practice, score 1 means no off target while you suggest in the opposite way.

So does it means the score in OfftargetAnalysis.xls file is the opposite way of the score get from zhang website?

I`m currently trying to find a cuttoff to filter out my libraries. Is there any resonable advice of a cutoff in terms of the topN.offtargettotalscore?

0
Entering edit mode

Nan, A great question! Please see my response at https://support.bioconductor.org/p/61007 and upvote it if it is helpful.

In short, the score from MIT is calculated as, 100/( 100 + [CRISPRseek top100OfftargetTotalScore ]) If CRISPRseek top100OfftargetTotalScore = 10, then the MIT score would be 100/(100+ 10) = 90.9. Best regards, Julie

0
Entering edit mode

Thank you, Julie. I actually read that before. How about the CFD score? Is it the same formula?

Best, Nan

1
Entering edit mode

You are welcome, Nan!

Yes, CFD score is the same!

Best regards,

Julie

0
Entering edit mode

Hi Julie,

I am a little confused about the score where you mentioned "NA" means no off target found, but I still get a 0 score in some cases. What is the difference between score 0 and "NA"?

0
Entering edit mode

Nan,

A great question!

Could you please look at one of the output files offTargets.xls to compare the two gRNAs and their offTargets to see if there are any differences in terms of their offTargets? If it is still hard to distinguish these two cases, could you please post the two gRNA sequences and the code snippets to run offTargets analysis including loading the required libraries and sessionInfo()?

Thanks!

Best regards, Julie

0
Entering edit mode

Hi Julie,

Thanks for such a quick response. In the offTargets.xls file, I can find that both sgRNAs have OffTargetSequence and same score as 1.

name gRNAPlusPAM OffTargetSequence inExon inIntron entrez_id gene score sgRNA-1 GTTCTCTTTTGCCTGATTCCNGG GTTCTCTTTTGCCTGATTCCAGG TRUE 387249 Mirlet7g 1 sgRNA-15906 TTCCTGGCCGGCTAAGGAGCNGG TTCCTGGCCGGCTAAGGAGCAGG TRUE 102465905 Mir8120 1

My R code is here:

##### parameters

REpatternFile <- system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek") #loading restriction enzyme site pattern scoring_method <- "CFDscore" #scoring method core <- 20 # number of cores to run the job

##### Code_mouse

mouseoutputDir <-"offtargetscoring/output/mouseCFDoutput" mouseinput <- "offtargetscoring/input/mousesgRNACRISPRseek.fa" mouseresults <- offTargetAnalysis(inputFilePath = mouseinput, findgRNAs = FALSE, findgRNAsWithREcutOnly = FALSE, REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE, BSgenomeName = Mmusculus, txdb = TxDb.Mmusculus.UCSC.mm10.knownGene, orgAnn = org.Mm.egSYMBOL, max.mismatch = 1,
topN.OfftargetTotalScore=100,
enable.multicore = TRUE, n.cores.max = core, scoring.method = scoringmethod, outputDir = mouseoutputDir, overwrite = TRUE)*

Here is R sessionInfo:

sessionInfo() \R version 3.5.1 (2018-07-02) Platform: x8664-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS: /opt/R/3.5.1/lib64/R/lib/libRblas.so LAPACK: /opt/R/3.5.1/lib64/R/lib/libRlapack.so locale: [1] LCCTYPE=enUS.UTF-8 LCNUMERIC=C [3] LCTIME=enUS.UTF-8 LCCOLLATE=enUS.UTF-8 [5] LCMONETARY=enUS.UTF-8 LCMESSAGES=enUS.UTF-8 [7] LCPAPER=enUS.UTF-8 LCNAME=C [9] LCADDRESS=C LCTELEPHONE=C [11] LCMEASUREMENT=enUS.UTF-8 LCIDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.1

0
Entering edit mode

Nan,

Thanks for posting the code and gRNAs!

FYI, CFD score of 1 means perfect match. Is it corret that the following two are the on-targets instead of off-targets? Do you find any off-targets for these two gRNAs? Thanks!

I just ran your testing code and there is no offtarget found for either of the two gRNAs allowing at most 1 mismatch, and the topNOffftargetTotalSore is NA for both gRNAs in the summary.xls, detailed below.

names forViewInUCSC extendedSequence gRNAefficacy gRNAsPlusPAM top5OfftargetTotalScore top100OfftargetTotalScore top1Hit.onTarget.MMdistance2PAM topOfftarget1MMdistance2PAM topOfftarget2MMdistance2PAM topOfftarget3MMdistance2PAM topOfftarget4MMdistance2PAM topOfftarget5MMdistance2PAM topOfftarget6MMdistance2PAM topOfftarget7MMdistance2PAM topOfftarget8MMdistance2PAM topOfftarget9MMdistance2PAM topOfftarget10MMdistance2PAM REname uniqREin200 uniqREin100

g2TTCCTGGCCGGCTAAGGAGC chr3:65659356-65659378 AAACTTCCTGGCCGGCTAAGGAGCAGGGCA 0.023752557 TTCCTGGCCGGCTAAGGAGCNGG NA NA NMM

g1GTTCTCTTTTGCCTGATTCC chr9:106178822-106178844 CTCCGTTCTCTTTTGCCTGATTCCAGGCTG 0.092043552 GTTCTCTTTTGCCTGATTCCNGG NA NA NMM HinfI TfiI HinfI TfiI HinfI TfiI

Best, Julie

0
Entering edit mode

Hi Julie,

Thanks for your time and help.

Best, Nan

0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 6 days ago
United States
0
Entering edit mode
0
Entering edit mode

Hi Julie,

I have a question about "CFDscore". My understanding is that top50.OfftargetTotalScore is calculated by adding adding topN scores together.

I can see my top50.OfftargetTotalScore/top100.OfftargetTotalScore is NA (which I assume no offTargets?). When I look for offTarget at guide with the top10 score NA I can see a a "CFD score" calculated for only one site and I see a number i.e. 0.2. I would assume that top10 would be still calculated but would only contain 0.2 even if there are no other values?

### my code below, I test a set of mine guides

offTargetAnalysis(inputFilePath,

REpatternFile = REpatternFile,
scoring.method = "CFDscore",
format = "fasta",
findgRNAs = FALSE, # important for testing to set FALSE
findgRNAsWithREcutOnly = FALSE, # if FALSE not restr. enzymes
findPairedgRNAOnly = FALSE,
gRNA.name.prefix = "sg.",
orgAnn = orgAnn,
BSgenomeName = BSgenomeName,
txdb = txdb,
chromToSearch= "all", # change here for all to look at all chromosomes
min.gap = 0, max.gap = 20,
max.mismatch = 3,
min.score = 0.1,
topN = 100,
topN.OfftargetTotalScore= 10, # 10 top Offtarget will be calculated
annotateExon = TRUE,
fetchSequence = TRUE, upstream = 250, downstream = 250,
overlap.gRNA.positions = c(17, 18),
PAM.size = 3,
PAM = "NGG",
gRNA.size = 20,
outputDir = outputDir,
overwrite = TRUE)

0
Entering edit mode
Hi Dawid, Is this a unique problem with scoring.method = "CFDscore"? Thanks! Best regards, Julie From: "Dawid G. Nowak [bioc]" <noreply@bioconductor.org> Reply-To: "reply+c0630670+code@bioconductor.org" <reply+c0630670+code@bioconductor.org> Date: Tuesday, October 17, 2017 at 1:36 PM To: "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu> Subject: [bioc] C: CRISPRseek scoring clarification Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Dawid G. Nowak<https: support.bioconductor.org="" u="" 6790=""/> wrote Comment: CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049="" #101727="">: Hi Julie, I have a question about "CFDscore". My understanding is that top50.OfftargetTotalScore is calculated by adding adding topN scores together. I can see my top50.OfftargetTotalScore/top100.OfftargetTotalScore is NA (which I assume no offTargets?). When I look for offTarget at guide with the top10 score NA I can see a a "CFD score" calculated for only one site and I see a number i.e. 0.2. I would assume that top10 would be still calculated but would only contain 0.2 even if there are no other values? ### my code below, I test a set of mine guides offTargetAnalysis(inputFilePath, REpatternFile = REpatternFile, scoring.method = "CFDscore", format = "fasta", findgRNAs = FALSE, # important for testing to set FALSE findgRNAsWithREcutOnly = FALSE, # if FALSE not restr. enzymes findPairedgRNAOnly = FALSE, gRNA.name.prefix = "sg.", orgAnn = orgAnn, BSgenomeName = BSgenomeName, txdb = txdb, chromToSearch= "all", # change here for all to look at all chromosomes min.gap = 0, max.gap = 20, max.mismatch = 3, min.score = 0.1, topN = 100, topN.OfftargetTotalScore= 10, # 10 top Offtarget will be calculated annotateExon = TRUE, fetchSequence = TRUE, upstream = 250, downstream = 250, overlap.gRNA.positions = c(17, 18), PAM.size = 3, PAM = "NGG", gRNA.size = 20, outputDir = outputDir, overwrite = TRUE) ________________________________ Post tags: crisprseek You may reply via email or visit C: CRISPRseek scoring clarification
0
Entering edit mode

Hi,

I just tested both scoring methods and I see this situation with NA in both cases. I made some comments below, I started to notice them when I test target and off-target analysis for my specified gRNAs. I use CRISPRseek 1.16.0

 1) When I assign x <-  offTargetAnalysis(), then I can see x$summary but I cannot see anything in Summary.xls created by the package. 2) I also noticed a discrepancy between TopN score calculated in Summary and OfftargetAnalysis., i.e. if I take top5 hits from OfftargetAnalysis and sum up I see a different number than in Summary. Thanks, Dawid # below example of guide that gave me NA scores but scores where still calculated >sg.test2 GACCGGAACGATCTCGCGTANGG ADD REPLY 0 Entering edit mode Dawid, Thanks for testing both scoring methods! It might be an issue with data type. Could you please send me the testing input file, the code and the output?My email is Julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>. Thanks! Best regards, Julie From: "Dawid G. Nowak [bioc]" <noreply@bioconductor.org> Reply-To: "reply+0ad43044+code@bioconductor.org" <reply+0ad43044+code@bioconductor.org> Date: Tuesday, October 17, 2017 at 4:46 PM To: "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu> Subject: [bioc] C: CRISPRseek scoring clarification Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Dawid G. Nowak<https: support.bioconductor.org="" u="" 6790=""/> wrote Comment: CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049="" #101731="">: Hi, I just tested both scoring methods and I see this situation with NA in both cases. I made some comments below, I started to notice them when I test target and off-target analysis for my specified gRNAs. I use CRISPRseek 1.16.0 1) When I assign x <- offTargetAnalysis(), then I can see x$summary but I cannot see anything in Summary.xls created by the package. 2) I also noticed a discrepancy between TopN score calculated in Summary and OfftargetAnalysis., i.e. if I take top5 hits from OfftargetAnalysis and sum up I see a different number than in Summary. Thanks, Dawid # below example of guide that gave me NA scores but scores where still calculated >sg.test2 GACCGGAACGATCTCGCGTANGG ________________________________ Post tags: crisprseek You may reply via email or visit C: CRISPRseek scoring clarification
1
Entering edit mode
Dawid, Thanks for the test script and input! When there is no on-target found for any input gRNAs, summary.xls file will be empty. Also, the topN score is calculated assuming the on-target is present for the gRNA (sum(2nd, Nth)) I am updating the dev code to handle this exception and will post an update in the support site. Thanks again for reporting the issue! Best, Julie
0
Entering edit mode

Hi Julie,

I noticed recently for couple tests that  I am getting “NA" for top5OfftargetTotalScore or top10OfftargetTotalScore. I tested guides with NA values for top5OfftargetTotalScore with different online tools and it showed me very low off-target risks. When there is no off-target found for any input gRNAs, summary.xls file will be NA because (sum(2nd=NA,  Nth=NA))?

Thanks,
Dawid

0
Entering edit mode

Hi Dawid,

Could you please let me know your session info? If possible, could you please post your code and an example input file? Thanks!

Best regards,

Julie