Question: CFDscore calculations used in CRISPRseek
1
18 months ago by
United States
Dawid G. Nowak10 wrote:

Hi,

I have a question about “CFDscore" calculations used in CRISPRseek. I have a guide that I am testing against target DNA sequence using compare2Sequences(). I see some differences when I compare a table manually to the calculated score using compare2Sequences(). Just to be sure how “CFDscore" is implemented and before I go to details with my sequence, I would appreciate answers to the below questions

Questions:
1) Is “CFDscore” in CRISPRseek is calculated based on the table “NatureBiot2016SuppTable19DoenchRoot" (Doench et al., 2016) and Percent-Active column?

2) I have a guide:DNA mismatch (only one mismatch) in position 1, rA:dG, Percent-Active columns gives a score: 0.857. Does it mean this number is used as a score or there is another parameter using this score and recalculating the final CFDscore?

2) If there is a guide:DNA mismatch that doesn’t exist in the table i.e. rU:dA how the score is calculated?

Thanks,
Dawid

crisprseek cfdscore • 366 views
modified 18 months ago by Julie Zhu4.1k • written 18 months ago by Dawid G. Nowak10
Answer: CFDscore calculations used in CRISPRseek
1
18 months ago by
Julie Zhu4.1k
United States
Julie Zhu4.1k wrote:
Dawid, 1. Yes, “CFDscore” in CRISPRseek is calculated based on the table “NatureBiot2016SuppTable19DoenchRoot" (Doench et al., 2016) and Percent-Active column. 2. CFDscore is calculated using mismatch type and mismatch position although the mismatch activity is given as pos 20, 19, 18,....1 distance from PAM. In addition, PAM sequence affects the CFDscore specified as subPAM.activity = hash( AA =0, AC = 0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0, TG = 0.038961039, TT = 0) 3 All possible mismatches have been captured in the table. For each base, there are 3 possible type of mismatches. FYI, rU:dA, rG:dC, rC:dG and rA:dT are all considered matches. Best regards, Julie

Hi Julie,

I was testing non-targeting guides for human. I'm interested in to compare different scoring methods and how they score with guides that shouldn't target i.e. human genome, example here:

# guide
>neg01
GTAGCGAACGTGTCCGGCGT
# code
offTargetAnalysis(inputFilePath, REpatternFile = REpatternFile, scoring.method = "CFDscore", min.score = 0,format = "fasta", findgRNAs = FALSE,findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE, gRNA.name.prefix = "g.", orgAnn = orgAnn, BSgenomeName = BSgenomeName,txdb = txdb,annotateExon = TRUE,chromToSearch= "all", min.gap = 0, max.gap = 20,max.mismatch = 3,topN = 100,topN.OfftargetTotalScore= 10,fetchSequence = TRUE, upstream = 250, downstream = 250,overlap.gRNA.positions = c(17, 18), PAM.size = 3,PAM = "NGG",gRNA.size = 20, outputDir = outputDir,overwrite = TRUE)

When I used max.mismatch = 3, I got a below error (it works with 4 mismatches). Wouldn't be better to have this result in Summary.xls file and include below (i.e. NA or 0), instead as it is now not showing those guides and giving an error?

 top5OfftargetTotalScore top10OfftargetTotalScore top1Hit.onTarget.MMdistance2PAM NA or 0 NA or 0 perfect match not found
Error in buildFeatureVectorForScoring(hits = hits, canonical.PAM = PAM,  :
Empty hits!
In searchHits2(gRNAs = gRNAs, PAM = PAM, PAM.pattern = PAM.pattern,  :
sure you are using the right genome. You can also alter your
search criteria such as increasing max.mismatch!



Thanks,
Dawid



> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.1

attached base packages:
[1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.36.1                           Hmisc_4.1-1                              ggplot2_3.1.0                            Formula_1.2-3                            survival_2.43-1
[6] lattice_0.20-38                          org.Hs.eg.db_3.6.0                       TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  BSgenome.Hsapiens.UCSC.hg19_1.4.0        org.Mm.eg.db_3.6.0
[11] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0 GenomicFeatures_1.32.3                   AnnotationDbi_1.42.1                     Biobase_2.40.0                           BSgenome.Mmusculus.UCSC.mm10_1.4.0
[16] BSgenome_1.48.0                          rtracklayer_1.40.6                       CRISPRseek_1.20.0                        seqinr_3.4-5                             msa_1.12.0
[21] BiocInstaller_1.30.0                     GenomicRanges_1.32.7                     GenomeInfoDb_1.16.0                      Biostrings_2.48.0                        XVector_0.20.0
[26] IRanges_2.14.12                          S4Vectors_0.18.3                         BiocGenerics_0.26.0                      seqRFLP_1.0.1                           

1
Dear Dawid, Thanks for the great suggestion! I will add this feature in the dev version and keep you posted. Best, Julie

Julie,

Thanks for your reply, I think that could be also applied when you have a targeting guide but it doesn't have off-targets (at least based on the scoring system).

Dawid

Dawid, FYI, the current version works as long as there is at least one match with mismatch <= max.mismatch including perfect match. I have updated the package so that summary file will be generated even when there is no match found in the genome. Here is the command to download the most recent version. git clone git.bioconductor.org:packages/CRISPRseek Please let me know if it works as intended. Thanks! Best regards, Julie

Hi Julie,

I used max.mismatch = 3 and in Summary.xls guide (>neg01) is skipped, even after using CRISPRseek_1.23.1.

Dawid

Dawid, I used the sequence you posted and it worked. I just realized that it would not work if you have a mixture of mappable and non-mappable guides. Could you please post the list of guides and code snippets you used for testing? Thanks! Best regards, Julie
1

Dawid,

The summary file generated from running the new version 1.23.2 (git clone git.bioconductor.org:packages/CRISPRseek) should include both types of guides mappable or not mappable to a given genome.  Could you please try the following code snippets with the test input sequences? Thanks!

library(CRISPRseek)

library("BSgenome.Hsapiens.UCSC.hg19")

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

library(org.Hs.eg.db)

outputDir <- getwd()

inputFilePath <- "~/DropboxUmass/Bioconductor/Trunk/test.fa"

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,

findPairedgRNAOnly = FALSE,

annotatePaired = FALSE,

BSgenomeName = Hsapiens, chromToSearch = "chrX",

txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,

orgAnn = org.Hs.egSYMBOL, max.mismatch = 0,

outputDir = outputDir, overwrite = TRUE)

#### test.fa

>Hsap_GATA1_ex2

ccagtttgtggatcctgctctggtgtcctccacaccagaatcaggg

>test

GTAGCGAACGTGTCCGGCGTAGG

Thanks Julie, I tested quickly, interestingly when I have 1x mappable guide and 1x unmappable guide it seems to work (summary file has both), but when I have 2x mappable guides and 1x unmappable it seems to skip unmappable again in summary file. I run these guides below:

>neg01_unmappable

GTAGCGAACGTGTCCGGCGT

>mappable_1

CAGAGTCTCCTATGCCACAC

>mappable_2

GAAGATGGGCGGGAGTCTTC

Hi Julie,

I tested again, it seems to give me this error below when I test 2x mappable guides and 1x unmappable guide (Everything works when there is 1x mappable guide and 1x unmappable)

Error in discover_repository(path) :
Error in 'git2r_repository_discover': 'path' must be a character vector of length one with non NA value
1: In normalizePath(path) :
path[1]="mappable_1": No such file or directory
2: In normalizePath(path) :
path[2]="chr11:116998270-116998292": No such file or directory
3: In normalizePath(path) :
path[3]="TGTTCAGAGTCTCCTATGCCACACAGGAAA": No such file or directory
4: In normalizePath(path) :
path[4]="0.0660718322174466": No such file or directory

# Tested guides
>neg01_unmappable
GTAGCGAACGTGTCCGGCGT
>mappable_1
CAGAGTCTCCTATGCCACAC
>mappable_2
GAAGATGGGCGGGAGTCTTC
# code
offTargetAnalysis(inputFilePath, REpatternFile = REpatternFile, scoring.method = "CFDscore", min.score = 0,format = "fasta",
findgRNAs = FALSE, findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE, gRNA.name.prefix = "g.", orgAnn = orgAnn,
BSgenomeName = BSgenomeName,txdb = txdb,annotateExon = TRUE,chromToSearch= "all", min.gap = 0, max.gap = 20,max.mismatch = 3,topN = 100,
topN.OfftargetTotalScore= 10,fetchSequence = TRUE, upstream = 250, downstream = 250, overlap.gRNA.positions = c(17, 18), PAM.size = 3,PAM = "NGG",
gRNA.size = 20, outputDir = outputDir,overwrite = TRUE)

# d> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.1

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] CRISPRseek_1.23.2                        BiocInstaller_1.32.1                     usethis_1.4.0                            devtools_2.0.1                           Hmisc_4.1-1
[6] ggplot2_3.1.0                            Formula_1.2-3                            survival_2.43-3                          lattice_0.20-38                          org.Hs.eg.db_3.7.0
[11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  BSgenome.Hsapiens.UCSC.hg19_1.4.0        org.Mm.eg.db_3.7.0                       TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4 GenomicFeatures_1.34.1
[16] AnnotationDbi_1.44.0                     Biobase_2.42.0                           BSgenome.Mmusculus.UCSC.mm10_1.4.0       seqinr_3.4-5                             BSgenome_1.50.0
[21] rtracklayer_1.42.1                       Biostrings_2.50.1                        XVector_0.22.0                           GenomicRanges_1.34.0                     GenomeInfoDb_1.18.1
[26] IRanges_2.16.0                           S4Vectors_0.20.1                         BiocGenerics_0.28.0