Question: CFDscore calculations used in CRISPRseek
1
gravatar for Dawid G. Nowak
10 months ago by
United States
Dawid G. Nowak10 wrote:

Hi,

I have a question about “CFDscore" calculations used in CRISPRseek. I have a guide that I am testing against target DNA sequence using compare2Sequences(). I see some differences when I compare a table manually to the calculated score using compare2Sequences(). Just to be sure how “CFDscore" is implemented and before I go to details with my sequence, I would appreciate answers to the below questions

Questions:
1) Is “CFDscore” in CRISPRseek is calculated based on the table “NatureBiot2016SuppTable19DoenchRoot" (Doench et al., 2016) and Percent-Active column? 

2) I have a guide:DNA mismatch (only one mismatch) in position 1, rA:dG, Percent-Active columns gives a score: 0.857. Does it mean this number is used as a score or there is another parameter using this score and recalculating the final CFDscore?

2) If there is a guide:DNA mismatch that doesn’t exist in the table i.e. rU:dA how the score is calculated?

Thanks,
Dawid

crisprseek cfdscore • 267 views
ADD COMMENTlink modified 10 months ago by Julie Zhu4.0k • written 10 months ago by Dawid G. Nowak10
Answer: CFDscore calculations used in CRISPRseek
1
gravatar for Julie Zhu
10 months ago by
Julie Zhu4.0k
United States
Julie Zhu4.0k wrote:
Dawid, 1. Yes, “CFDscore” in CRISPRseek is calculated based on the table “NatureBiot2016SuppTable19DoenchRoot" (Doench et al., 2016) and Percent-Active column. 2. CFDscore is calculated using mismatch type and mismatch position although the mismatch activity is given as pos 20, 19, 18,....1 distance from PAM. In addition, PAM sequence affects the CFDscore specified as subPAM.activity = hash( AA =0, AC = 0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0, TG = 0.038961039, TT = 0) 3 All possible mismatches have been captured in the table. For each base, there are 3 possible type of mismatches. FYI, rU:dA, rG:dC, rC:dG and rA:dT are all considered matches. Best regards, Julie
ADD COMMENTlink modified 10 months ago • written 10 months ago by Julie Zhu4.0k

Hi Julie,

I was testing non-targeting guides for human. I'm interested in to compare different scoring methods and how they score with guides that shouldn't target i.e. human genome, example here:


# guide
>neg01
GTAGCGAACGTGTCCGGCGT
# code
offTargetAnalysis(inputFilePath, REpatternFile = REpatternFile, scoring.method = "CFDscore", min.score = 0,format = "fasta", findgRNAs = FALSE,findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE, gRNA.name.prefix = "g.", orgAnn = orgAnn, BSgenomeName = BSgenomeName,txdb = txdb,annotateExon = TRUE,chromToSearch= "all", min.gap = 0, max.gap = 20,max.mismatch = 3,topN = 100,topN.OfftargetTotalScore= 10,fetchSequence = TRUE, upstream = 250, downstream = 250,overlap.gRNA.positions = c(17, 18), PAM.size = 3,PAM = "NGG",gRNA.size = 20, outputDir = outputDir,overwrite = TRUE)

When I used max.mismatch = 3, I got a below error (it works with 4 mismatches). Wouldn't be better to have this result in Summary.xls file and include below (i.e. NA or 0), instead as it is now not showing those guides and giving an error?  

top5OfftargetTotalScore top10OfftargetTotalScore top1Hit.onTarget.MMdistance2PAM
NA or 0 NA or 0 perfect match not found
Error in buildFeatureVectorForScoring(hits = hits, canonical.PAM = PAM,  : 
  Empty hits!
In addition: Warning message:
In searchHits2(gRNAs = gRNAs, PAM = PAM, PAM.pattern = PAM.pattern,  :
  No matching found, please check your input sequence, and make
            sure you are using the right genome. You can also alter your 
            search criteria such as increasing max.mismatch!


Thanks,
Dawid




> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.1

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.36.1                           Hmisc_4.1-1                              ggplot2_3.1.0                            Formula_1.2-3                            survival_2.43-1                         
 [6] lattice_0.20-38                          org.Hs.eg.db_3.6.0                       TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  BSgenome.Hsapiens.UCSC.hg19_1.4.0        org.Mm.eg.db_3.6.0                      
[11] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0 GenomicFeatures_1.32.3                   AnnotationDbi_1.42.1                     Biobase_2.40.0                           BSgenome.Mmusculus.UCSC.mm10_1.4.0      
[16] BSgenome_1.48.0                          rtracklayer_1.40.6                       CRISPRseek_1.20.0                        seqinr_3.4-5                             msa_1.12.0                              
[21] BiocInstaller_1.30.0                     GenomicRanges_1.32.7                     GenomeInfoDb_1.16.0                      Biostrings_2.48.0                        XVector_0.20.0                          
[26] IRanges_2.14.12                          S4Vectors_0.18.3                         BiocGenerics_0.26.0                      seqRFLP_1.0.1                           

 

 

ADD REPLYlink written 4 months ago by Dawid G. Nowak10
1
Dear Dawid, Thanks for the great suggestion! I will add this feature in the dev version and keep you posted. Best, Julie
ADD REPLYlink written 4 months ago by Julie Zhu4.0k

Julie,

Thanks for your reply, I think that could be also applied when you have a targeting guide but it doesn't have off-targets (at least based on the scoring system). 

Dawid

ADD REPLYlink written 4 months ago by Dawid G. Nowak10
Dawid, FYI, the current version works as long as there is at least one match with mismatch <= max.mismatch including perfect match. I have updated the package so that summary file will be generated even when there is no match found in the genome. Here is the command to download the most recent version. git clone git.bioconductor.org:packages/CRISPRseek Please let me know if it works as intended. Thanks! Best regards, Julie
ADD REPLYlink modified 4 months ago • written 4 months ago by Julie Zhu4.0k

Hi Julie,

I used max.mismatch = 3 and in Summary.xls guide (>neg01) is skipped, even after using CRISPRseek_1.23.1.

Dawid

 

ADD REPLYlink written 4 months ago by Dawid G. Nowak10
Dawid, I used the sequence you posted and it worked. I just realized that it would not work if you have a mixture of mappable and non-mappable guides. Could you please post the list of guides and code snippets you used for testing? Thanks! Best regards, Julie
ADD REPLYlink written 4 months ago by Julie Zhu4.0k
1

Dawid,

The summary file generated from running the new version 1.23.2 (git clone git.bioconductor.org:packages/CRISPRseek) should include both types of guides mappable or not mappable to a given genome.  Could you please try the following code snippets with the test input sequences? Thanks!

library(CRISPRseek)

library("BSgenome.Hsapiens.UCSC.hg19")

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

library(org.Hs.eg.db)

outputDir <- getwd()

 

inputFilePath <- "~/DropboxUmass/Bioconductor/Trunk/test.fa"

             

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,

                       findPairedgRNAOnly = FALSE,

                  annotatePaired = FALSE,

                  BSgenomeName = Hsapiens, chromToSearch = "chrX",

                  txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,

                  orgAnn = org.Hs.egSYMBOL, max.mismatch = 0,

                  outputDir = outputDir, overwrite = TRUE)

#### test.fa

>Hsap_GATA1_ex2

ccagtttgtggatcctgctctggtgtcctccacaccagaatcaggg

>test

GTAGCGAACGTGTCCGGCGTAGG

ADD REPLYlink modified 4 months ago • written 4 months ago by Julie Zhu4.0k

Thanks Julie, I tested quickly, interestingly when I have 1x mappable guide and 1x unmappable guide it seems to work (summary file has both), but when I have 2x mappable guides and 1x unmappable it seems to skip unmappable again in summary file. I run these guides below:

 

>neg01_unmappable

GTAGCGAACGTGTCCGGCGT

>mappable_1

CAGAGTCTCCTATGCCACAC

>mappable_2

GAAGATGGGCGGGAGTCTTC

 

ADD REPLYlink written 4 months ago by Dawid G. Nowak10

Hi Julie, 

I tested again, it seems to give me this error below when I test 2x mappable guides and 1x unmappable guide (Everything works when there is 1x mappable guide and 1x unmappable)

Error in discover_repository(path) : 
  Error in 'git2r_repository_discover': 'path' must be a character vector of length one with non NA value
In addition: Warning messages:
1: In normalizePath(path) :
  path[1]="mappable_1": No such file or directory
2: In normalizePath(path) :
  path[2]="chr11:116998270-116998292": No such file or directory
3: In normalizePath(path) :
  path[3]="TGTTCAGAGTCTCCTATGCCACACAGGAAA": No such file or directory
4: In normalizePath(path) :
  path[4]="0.0660718322174466": No such file or directory
  
# Tested guides  
>neg01_unmappable
GTAGCGAACGTGTCCGGCGT
>mappable_1
CAGAGTCTCCTATGCCACAC
>mappable_2
GAAGATGGGCGGGAGTCTTC
# code
offTargetAnalysis(inputFilePath, REpatternFile = REpatternFile, scoring.method = "CFDscore", min.score = 0,format = "fasta", 
findgRNAs = FALSE, findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE, gRNA.name.prefix = "g.", orgAnn = orgAnn, 
BSgenomeName = BSgenomeName,txdb = txdb,annotateExon = TRUE,chromToSearch= "all", min.gap = 0, max.gap = 20,max.mismatch = 3,topN = 100,
topN.OfftargetTotalScore= 10,fetchSequence = TRUE, upstream = 250, downstream = 250, overlap.gRNA.positions = c(17, 18), PAM.size = 3,PAM = "NGG",
gRNA.size = 20, outputDir = outputDir,overwrite = TRUE)

# d> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.1

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] CRISPRseek_1.23.2                        BiocInstaller_1.32.1                     usethis_1.4.0                            devtools_2.0.1                           Hmisc_4.1-1                             
 [6] ggplot2_3.1.0                            Formula_1.2-3                            survival_2.43-3                          lattice_0.20-38                          org.Hs.eg.db_3.7.0                      
[11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  BSgenome.Hsapiens.UCSC.hg19_1.4.0        org.Mm.eg.db_3.7.0                       TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4 GenomicFeatures_1.34.1                  
[16] AnnotationDbi_1.44.0                     Biobase_2.42.0                           BSgenome.Mmusculus.UCSC.mm10_1.4.0       seqinr_3.4-5                             BSgenome_1.50.0                         
[21] rtracklayer_1.42.1                       Biostrings_2.50.1                        XVector_0.22.0                           GenomicRanges_1.34.0                     GenomeInfoDb_1.18.1                     
[26] IRanges_2.16.0                           S4Vectors_0.20.1                         BiocGenerics_0.28.0                     
ADD REPLYlink written 4 months ago by Dawid G. Nowak10
Hi Dawid, I tested the function with your input sequences and it ran fine at my end. I am not sure if the following post is helpful to you. https://github.com/cpsievert/pitchRx/issues/65 Best regards, Julie
ADD REPLYlink modified 4 months ago • written 4 months ago by Julie Zhu4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 271 users visited in the last hour