Question

CFDscore calculations used in CRISPRseek

1

Entering edit mode

Dawid G. Nowak ▴ 40

@dawid-g-nowak-6790

Last seen 2.9 years ago

United States

Hi,

I have a question about “CFDscore" calculations used in CRISPRseek. I have a guide that I am testing against target DNA sequence using compare2Sequences(). I see some differences when I compare a table manually to the calculated score using compare2Sequences(). Just to be sure how “CFDscore" is implemented and before I go to details with my sequence, I would appreciate answers to the below questions

Questions:
1) Is “CFDscore” in CRISPRseek is calculated based on the table “NatureBiot2016SuppTable19DoenchRoot" (Doench et al., 2016) and Percent-Active column?

2) I have a guide:DNA mismatch (only one mismatch) in position 1, rA:dG, Percent-Active columns gives a score: 0.857. Does it mean this number is used as a score or there is another parameter using this score and recalculating the final CFDscore?

2) If there is a guide:DNA mismatch that doesn’t exist in the table i.e. rU:dA how the score is calculated?

Thanks,
Dawid

crisprseek CFDscore • 2.7k views

ADD COMMENT • link updated 7.7 years ago by Julie Zhu ★ 4.3k • written 7.7 years ago by Dawid G. Nowak ▴ 40

score 1 · Answer 1 · 2018-06-07

1

Entering edit mode

Julie Zhu ★ 4.3k

@julie-zhu-3596

Last seen 2.2 years ago

United States

Dawid, 1. Yes, “CFDscore” in CRISPRseek is calculated based on the table “NatureBiot2016SuppTable19DoenchRoot" (Doench et al., 2016) and Percent-Active column. 2. CFDscore is calculated using mismatch type and mismatch position although the mismatch activity is given as pos 20, 19, 18,....1 distance from PAM. In addition, PAM sequence affects the CFDscore specified as subPAM.activity = hash( AA =0, AC = 0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0, TG = 0.038961039, TT = 0) 3 All possible mismatches have been captured in the table. For each base, there are 3 possible type of mismatches. FYI, rU:dA, rG:dC, rC:dG and rA:dT are all considered matches. Best regards, Julie

ADD COMMENT • link 7.7 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie,

I was testing non-targeting guides for human. I'm interested in to compare different scoring methods and how they score with guides that shouldn't target i.e. human genome, example here:

# guide
>neg01
GTAGCGAACGTGTCCGGCGT

# code
offTargetAnalysis(inputFilePath, REpatternFile = REpatternFile, scoring.method = "CFDscore", min.score = 0,format = "fasta", findgRNAs = FALSE,findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE, gRNA.name.prefix = "g.", orgAnn = orgAnn, BSgenomeName = BSgenomeName,txdb = txdb,annotateExon = TRUE,chromToSearch= "all", min.gap = 0, max.gap = 20,max.mismatch = 3,topN = 100,topN.OfftargetTotalScore= 10,fetchSequence = TRUE, upstream = 250, downstream = 250,overlap.gRNA.positions = c(17, 18), PAM.size = 3,PAM = "NGG",gRNA.size = 20, outputDir = outputDir,overwrite = TRUE)

When I used max.mismatch = 3, I got a below error (it works with 4 mismatches). Wouldn't be better to have this result in Summary.xls file and include below (i.e. NA or 0), instead as it is now not showing those guides and giving an error?

top5OfftargetTotalScore	top10OfftargetTotalScore	top1Hit.onTarget.MMdistance2PAM
NA or 0	NA or 0	perfect match not found

Error in buildFeatureVectorForScoring(hits = hits, canonical.PAM = PAM,  : 
  Empty hits!
In addition: Warning message:
In searchHits2(gRNAs = gRNAs, PAM = PAM, PAM.pattern = PAM.pattern,  :
  No matching found, please check your input sequence, and make
            sure you are using the right genome. You can also alter your 
            search criteria such as increasing max.mismatch!

Thanks,
Dawid

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.1

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.36.1                           Hmisc_4.1-1                              ggplot2_3.1.0                            Formula_1.2-3                            survival_2.43-1                         
 [6] lattice_0.20-38                          org.Hs.eg.db_3.6.0                       TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  BSgenome.Hsapiens.UCSC.hg19_1.4.0        org.Mm.eg.db_3.6.0                      
[11] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0 GenomicFeatures_1.32.3                   AnnotationDbi_1.42.1                     Biobase_2.40.0                           BSgenome.Mmusculus.UCSC.mm10_1.4.0      
[16] BSgenome_1.48.0                          rtracklayer_1.40.6                       CRISPRseek_1.20.0                        seqinr_3.4-5                             msa_1.12.0                              
[21] BiocInstaller_1.30.0                     GenomicRanges_1.32.7                     GenomeInfoDb_1.16.0                      Biostrings_2.48.0                        XVector_0.20.0                          
[26] IRanges_2.14.12                          S4Vectors_0.18.3                         BiocGenerics_0.26.0                      seqRFLP_1.0.1

ADD REPLY • link 7.2 years ago Dawid G. Nowak ▴ 40

1

Entering edit mode

Dear Dawid, Thanks for the great suggestion! I will add this feature in the dev version and keep you posted. Best, Julie

ADD REPLY • link 7.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Julie,

Thanks for your reply, I think that could be also applied when you have a targeting guide but it doesn't have off-targets (at least based on the scoring system).

Dawid

ADD REPLY • link 7.2 years ago Dawid G. Nowak ▴ 40

0

Entering edit mode

Dawid, FYI, the current version works as long as there is at least one match with mismatch <= max.mismatch including perfect match. I have updated the package so that summary file will be generated even when there is no match found in the genome. Here is the command to download the most recent version. git clone git.bioconductor.org:packages/CRISPRseek Please let me know if it works as intended. Thanks! Best regards, Julie

ADD REPLY • link 7.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie,

I used max.mismatch = 3 and in Summary.xls guide (>neg01) is skipped, even after using CRISPRseek_1.23.1.

Dawid

ADD REPLY • link 7.2 years ago Dawid G. Nowak ▴ 40

0

Entering edit mode

Dawid, I used the sequence you posted and it worked. I just realized that it would not work if you have a mixture of mappable and non-mappable guides. Could you please post the list of guides and code snippets you used for testing? Thanks! Best regards, Julie

ADD REPLY • link 7.2 years ago Julie Zhu ★ 4.3k

1

Entering edit mode

Dawid,

The summary file generated from running the new version 1.23.2 (git clone git.bioconductor.org:packages/CRISPRseek) should include both types of guides mappable or not mappable to a given genome. Could you please try the following code snippets with the test input sequences? Thanks!

library(CRISPRseek)

library("BSgenome.Hsapiens.UCSC.hg19")

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

library(org.Hs.eg.db)

outputDir <- getwd()

inputFilePath <- "~/DropboxUmass/Bioconductor/Trunk/test.fa"

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,

findPairedgRNAOnly = FALSE,

annotatePaired = FALSE,

BSgenomeName = Hsapiens, chromToSearch = "chrX",

txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,

orgAnn = org.Hs.egSYMBOL, max.mismatch = 0,

outputDir = outputDir, overwrite = TRUE)

#### test.fa

>Hsap_GATA1_ex2

ccagtttgtggatcctgctctggtgtcctccacaccagaatcaggg

>test

GTAGCGAACGTGTCCGGCGTAGG

ADD REPLY • link 7.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Thanks Julie, I tested quickly, interestingly when I have 1x mappable guide and 1x unmappable guide it seems to work (summary file has both), but when I have 2x mappable guides and 1x unmappable it seems to skip unmappable again in summary file. I run these guides below:

>neg01_unmappable

GTAGCGAACGTGTCCGGCGT

>mappable_1

CAGAGTCTCCTATGCCACAC

>mappable_2

GAAGATGGGCGGGAGTCTTC

ADD REPLY • link 7.2 years ago Dawid G. Nowak ▴ 40

0

Entering edit mode

Hi Julie,

I tested again, it seems to give me this error below when I test 2x mappable guides and 1x unmappable guide (Everything works when there is 1x mappable guide and 1x unmappable)

Error in discover_repository(path) : 
  Error in 'git2r_repository_discover': 'path' must be a character vector of length one with non NA value
In addition: Warning messages:
1: In normalizePath(path) :
  path[1]="mappable_1": No such file or directory
2: In normalizePath(path) :
  path[2]="chr11:116998270-116998292": No such file or directory
3: In normalizePath(path) :
  path[3]="TGTTCAGAGTCTCCTATGCCACACAGGAAA": No such file or directory
4: In normalizePath(path) :
  path[4]="0.0660718322174466": No such file or directory
  
# Tested guides  
>neg01_unmappable
GTAGCGAACGTGTCCGGCGT
>mappable_1
CAGAGTCTCCTATGCCACAC
>mappable_2
GAAGATGGGCGGGAGTCTTC

# code
offTargetAnalysis(inputFilePath, REpatternFile = REpatternFile, scoring.method = "CFDscore", min.score = 0,format = "fasta", 
findgRNAs = FALSE, findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE, gRNA.name.prefix = "g.", orgAnn = orgAnn, 
BSgenomeName = BSgenomeName,txdb = txdb,annotateExon = TRUE,chromToSearch= "all", min.gap = 0, max.gap = 20,max.mismatch = 3,topN = 100,
topN.OfftargetTotalScore= 10,fetchSequence = TRUE, upstream = 250, downstream = 250, overlap.gRNA.positions = c(17, 18), PAM.size = 3,PAM = "NGG",
gRNA.size = 20, outputDir = outputDir,overwrite = TRUE)

# d> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.1

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] CRISPRseek_1.23.2                        BiocInstaller_1.32.1                     usethis_1.4.0                            devtools_2.0.1                           Hmisc_4.1-1                             
 [6] ggplot2_3.1.0                            Formula_1.2-3                            survival_2.43-3                          lattice_0.20-38                          org.Hs.eg.db_3.7.0                      
[11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  BSgenome.Hsapiens.UCSC.hg19_1.4.0        org.Mm.eg.db_3.7.0                       TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4 GenomicFeatures_1.34.1                  
[16] AnnotationDbi_1.44.0                     Biobase_2.42.0                           BSgenome.Mmusculus.UCSC.mm10_1.4.0       seqinr_3.4-5                             BSgenome_1.50.0                         
[21] rtracklayer_1.42.1                       Biostrings_2.50.1                        XVector_0.22.0                           GenomicRanges_1.34.0                     GenomeInfoDb_1.18.1                     
[26] IRanges_2.16.0                           S4Vectors_0.20.1                         BiocGenerics_0.28.0

ADD REPLY • link 7.2 years ago Dawid G. Nowak ▴ 40

0

Entering edit mode

Hi Dawid, I tested the function with your input sequences and it ran fine at my end. I am not sure if the following post is helpful to you. https://github.com/cpsievert/pitchRx/issues/65 Best regards, Julie

ADD REPLY • link 7.2 years ago Julie Zhu ★ 4.3k