CRISPRseek Unusual Error
3
0
Entering edit mode
@dawid-g-nowak-6790
Last seen 21 months ago
United States

Hi,

I am getting this error below and Summary and Off-targets file are not created. I would appreciate any suggestion.

Thanks, Dawid

Building feature vectors for scoring ...
Calculating scores ...
Error in `$<-.data.frame`(`*tmp*`, "score", value = c(0.0112619406241216,  : 
  replacement has 443126 rows, data has 443130

This the code below:

offTargetAnalysis(inputFilePath,
                      REpatternFile = REpatternFile,
                      scoring.method = "CFDscore",
                      min.score = 0.000001,
                      format = "fasta",
                      findgRNAsWithREcutOnly = FALSE, # if FALSE not restr. enzymes
                      findPairedgRNAOnly = FALSE, # paired only turn to FALSE if only WT Cas9 not Nickase
                      gRNA.name.prefix = "g.",                 
                      orgAnn = orgAnn,
                      BSgenomeName = BSgenomeName,
                      txdb = txdb,
                      chromToSearch= "all", # change here for all to look at all chromosomes
                      min.gap = 0, max.gap = 20,
                      max.mismatch = 4, # changed on 11-26-18 to 4 from 3
                      topN = 1000,
                      topN.OfftargetTotalScore= 10, # 10 top Offtarget will be calculated 
                      annotateExon = TRUE,
                      fetchSequence = TRUE, upstream = 250, downstream = 250,
                      overlap.gRNA.positions = c(17, 18), 
                      PAM.size = 3,
                      PAM = "NGG",
                      foldgRNAs = TRUE,
                      gRNA.size = 20, 
                      outputDir = outputDir,
                      overwrite = TRUE)

>     attached base packages: [1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     
>     
>     other attached packages:  [1] Hmisc_4.2-0                              ggplot2_3.1.0                            Formula_1.2-3                
> survival_2.43-3                          lattice_0.20-38              
> [6] org.Hs.eg.db_3.7.0                      
> TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 
> BSgenome.Hsapiens.UCSC.hg19_1.4.0        org.Mm.eg.db_3.7.0           
> TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.4 [11] GenomicFeatures_1.34.3  
> AnnotationDbi_1.44.0                     Biobase_2.42.0               
> BSgenome.Mmusculus.UCSC.mm10_1.4.0       BSgenome_1.50.0              
> [16] rtracklayer_1.42.1                       GenomicRanges_1.34.0    
> GenomeInfoDb_1.18.1                      seqRFLP_1.0.1                
> CRISPRseek_1.22.1                        [21] seqinr_3.4-5            
> Biostrings_2.50.2                        XVector_0.22.0               
> IRanges_2.16.0                           S4Vectors_0.20.1             
> [26] BiocGenerics_0.28.0
crisprseek • 1.7k views
ADD COMMENT
2
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 14 months ago
United States

Dawid,

I have modified the CFD scoring matrix in v1.23.4 to assign mismatch N score as (1 + sum(score of the 3 types of mismatch))/4 for each guide position. The updated changes can be downloaded using git. git clone git@git.bioconductor.org:packages/CRISPRseek

Please let me know if the following code snippet works for you. Thanks!

Best regards,

Julie

library(CRISPRseek)
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(org.Mm.eg.db)

orgAnn <- get("org.Mm.egSYMBOL")
BSgenomeName <- get("Mmusculus")
txdb <- get("TxDb.Mmusculus.UCSC.mm10.knownGene")

outputDir <- getwd()
inputFilePath <-  "~/testseq.fa"

results <- offTargetAnalysis(inputFilePath,
                  #scoring.method = "CFDscore",
                      min.score = 0.000001,
                      format = "fasta",
                      findgRNAsWithREcutOnly = FALSE,
                      findPairedgRNAOnly = FALSE, 
                      gRNA.name.prefix = "g",                 
                      orgAnn = orgAnn,
                      BSgenomeName = BSgenomeName,
                      txdb = txdb,
                      chromToSearch=  "all", 
                      chromToExclude = setdiff(seqlevels(BSgenomeName), paste("chr", c(1:20, "X", "Y", "M"), sep ="")),
                     max.mismatch = 4, 
                      topN = 1000,
                      topN.OfftargetTotalScore= 10,
                      annotateExon = TRUE,
                      fetchSequence = TRUE, upstream = 250, downstream = 250,
                      outputDir = outputDir,
                      overwrite = TRUE)
ADD COMMENT
1
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 14 months ago
United States

Dawid,

Could you please try to remove foldgRNA and see if the error is still there? Could you please also send me your input file? Thanks!

Best regards,

Julie

ADD COMMENT
0
Entering edit mode

Julie,

Thank for your reply. Removing foldgRNA didn't work. When I run only i.e. chromToSearch= "chr19" everything works fine.

Below sequence that tested:

> rightseq
gcggcaggccctccgagcgtggtggagccgttctgtgagacagccgggtacgagtcgtgacgctggaaggggcaagcgggtggtgggcaggaatgcggtccgccctgcagcaaccggagggggagggagaagggagcggaaaagtctccaccggacgcggccatggctcgggggggggggggcagcggaggagcgcttccggccgacgtctcgtcgctgattggcttcttttcctcccgccgtgtgtgaaaacacaaatggcgtgttttggttggcgtaaggcgcctgtcagttaacggcagccggagtgcgcagccgccggcagcctcgctctgcccactgggtggggcgggaggtaggtggggtgaggcgagctggacgtgcgggcgcggtcggcctctggcggggcgggggaggggagggagggtcagcgaaagtagctcgcgcgcgagcggccgcccaccctccccttcctctgggggagtcgttttacccgccgccggccgggcctcgtcgtctgattggctctcggggcccagaaaactggcccttgccattggctcgtgttcgtgcaagttgagtccatccgccggccagcgggggcggcgaggaggcgctcccaggttccggccctcccctcggccccgcgccgcagagtctggccgcgcgcccctgcgcaacgtggcaggaagcgcgcgctgggggcggggacgggcagtagggctgagcggctgcggggcgggtgcaagcacgtttccgacttgagttgcctcaagaggggcgtgctgagccagacctccatcgcgcactccggggagtggagggaaggagcgagggctcagttgggctgttttggaggcaggaagcacttgctctcccaaagtcgctctgagttgttatcagtaagggagctgcagtggagtaggcggggagaaggccgcacccttctccggaggggggaggggagtgttgcaatacctttctgggagttctctgctgcctcctggcttctgaggaccgccctgggcctgggagaatcccttccccctcttccctcgtgatctgcaactccagtctttctag
ADD REPLY
0
Entering edit mode

Dawid, thanks for the sequence!

I just tested your code with your sequence chromosome by chromosome and it ran successfully for chr1-20, X and Y. I also ran the analysis with chromToSearch= "all" successfully. If needed, I can share the results with you via DropBox.

Could you please try the following code and see if it runs for you? Thanks!

              library(CRISPRseek)
             library("BSgenome.Hsapiens.UCSC.hg19")
             library(TxDb.Hsapiens.UCSC.hg19.knownGene)
             library(org.Hs.eg.db)
             outputDir <- getwd()
             inputFilePath <-  "~/testseq.fa"
             BSgenomeName = Hsapiens

              txdb = TxDb.Hsapiens.UCSC.hg19.knownGene

              orgAnn = org.Hs.egSYMBOL

results <- offTargetAnalysis(inputFilePath,
                  scoring.method = "CFDscore",
                      min.score = 0.000001,
                      format = "fasta",
                      findgRNAsWithREcutOnly = FALSE,
                      findPairedgRNAOnly = FALSE, 
                      gRNA.name.prefix = "g",                 
                      orgAnn = orgAnn,
                      BSgenomeName = BSgenomeName,
                      txdb = txdb,
                      chromToSearch=  "all", 
                      **chromToExclude = setdiff(seqlevels(BSgenomeName), paste("chr", c(1:20, "X", "Y", "M"), sep ="")),**
                     max.mismatch = 4, 
                      topN = 1000,
                      topN.OfftargetTotalScore= 10,
                      annotateExon = TRUE,
                      fetchSequence = TRUE, upstream = 250, downstream = 250,
                      outputDir = outputDir,
                      overwrite = TRUE)

FYI, I updated all the packages with the following commands, which might be helpful to you. if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install()

Here is my sessionInfo. sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.6

Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] enUS.UTF-8/enUS.UTF-8/enUS.UTF-8/C/enUS.UTF-8/en_US.UTF-8

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] org.Hs.eg.db3.7.0
[2] TxDb.Hsapiens.UCSC.hg19.knownGene
3.2.2 [3] GenomicFeatures1.34.3
[4] AnnotationDbi
1.44.0
[5] Biobase2.42.0
[6] BSgenome.Hsapiens.UCSC.hg19
1.4.0
[7] BSgenome1.50.0
[8] rtracklayer
1.42.1
[9] GenomicRanges1.34.0
[10] GenomeInfoDb
1.18.2
[11] CRISPRseek1.22.1
[12] Biostrings
2.50.2
[13] XVector0.22.0
[14] IRanges
2.16.0
[15] S4Vectors0.20.1
[16] BiocGenerics
0.28.0

loaded via a namespace (and not attached): [1] Rcpp1.0.0 compiler3.5.0
[3] prettyunits1.0.2 progress1.2.0
[5] bitops1.0-6 tools3.5.0
[7] zlibbioc1.28.0 biomaRt2.38.0
[9] digest0.6.18 bit1.1-14
[11] RSQLite2.1.1 memoise1.1.0
[13] lattice0.20-38 pkgconfig2.0.2
[15] rlang0.3.1 Matrix1.2-15
[17] DelayedArray0.8.0 DBI1.0.0
[19] GenomeInfoDbData1.2.0 httr1.4.0
[21] stringr1.4.0 hms0.4.2
[23] ade41.7-13 bit640.9-7
[25] grid3.5.0 data.table1.12.0
[27] R62.3.0 hash2.2.6
[29] XML3.98-1.17 BiocParallel1.16.6
[31] magrittr1.5 seqinr3.4-5
[33] blob1.1.1 Rsamtools1.34.1
[35] matrixStats0.54.0 GenomicAlignments1.18.1
[37] MASS7.3-51.1 assertthat0.2.0
[39] SummarizedExperiment1.12.0 stringi1.3.1
[41] RCurl1.95-4.11 crayon1.3.4

Best regards,

Julie

ADD REPLY
0
Entering edit mode

Julie,

I was not clear that my test was in mouse. I see you tested for human.

Thanks, Dawid

ADD REPLY
0
Entering edit mode

Dawid, do you want me test it on in mouse? If yes, could you please post the library loading code? Thanks! Best, Julie

ADD REPLY
0
Entering edit mode
# mouse db
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(org.Mm.eg.db)

orgAnn <- get("org.Mm.egSYMBOL")
BSgenomeName <- get("Mmusculus")
txdb <- get("TxDb.Mmusculus.UCSC.mm10.knownGene")

# code
offTargetAnalysis(inputFilePath,
                      REpatternFile = REpatternFile,
                      scoring.method = "CFDscore",
                      min.score = 0.000001,
                      format = "fasta",
                      findgRNAsWithREcutOnly = FALSE, # if FALSE not restr. enzymes
                      findPairedgRNAOnly = FALSE, # paired only turn to FALSE if only WT Cas9 not Nickase
                      gRNA.name.prefix = "g.",                 
                      orgAnn = orgAnn,
                      BSgenomeName = BSgenomeName,
                      txdb = txdb,
                      chromToSearch= "all", # change here for all to look at all chromosomes
                      min.gap = 0, max.gap = 20,
                      max.mismatch = 4, # changed on 11-26-18 to 4 from 3
                      topN = 1000,
                      topN.OfftargetTotalScore= 10, # 10 top Offtarget will be calculated 
                      annotateExon = TRUE,
                      fetchSequence = TRUE, upstream = 250, downstream = 250,
                      overlap.gRNA.positions = c(17, 18), 
                      PAM.size = 3,
                      PAM = "NGG",
                      foldgRNAs = TRUE,
                      gRNA.size = 20, 
                      outputDir = outputDir,
                      overwrite = TRUE)

sessionInfo() R version 3.5.1 (2018-07-02) Platform: x8664-apple-darwin15.6.0 (64-bit) Running under: macOS 10.14.3 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib locale: [1] enUS.UTF-8/enUS.UTF-8/enUS.UTF-8/C/enUS.UTF-8/enUS.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc4.2-0 ggplot23.1.0 Formula1.2-3 survival2.43-3 lattice0.20-38 [6] org.Hs.eg.db3.7.0 TxDb.Hsapiens.UCSC.hg19.knownGene3.2.2 BSgenome.Hsapiens.UCSC.hg191.4.0 org.Mm.eg.db3.7.0 TxDb.Mmusculus.UCSC.mm10.knownGene3.4.4 [11] GenomicFeatures1.34.3 AnnotationDbi1.44.0 Biobase2.42.0 BSgenome.Mmusculus.UCSC.mm101.4.0 BSgenome1.50.0 [16] rtracklayer1.42.1 GenomicRanges1.34.0 GenomeInfoDb1.18.1 seqRFLP1.0.1 CRISPRseek1.22.1 [21] seqinr3.4-5 Biostrings2.50.2 XVector0.22.0 IRanges2.16.0 S4Vectors0.20.1 [26] BiocGenerics0.28.0 loaded via a namespace (and not attached): [1] bitops1.0-6 matrixStats0.54.0 bit640.9-7 RColorBrewer1.1-2 progress1.2.0 httr1.4.0 backports1.1.3 [8] tools3.5.1 R62.3.0 rpart4.1-13 DBI1.0.0 lazyeval0.2.1 colorspace1.4-0 nnet7.3-12 [15] ade41.7-13 GetoptLong0.1.7 withr2.1.2 gridExtra2.3 tidyselect0.2.5 prettyunits1.0.2 bit1.1-14 [22] compiler3.5.1 htmlTable1.13.1 DelayedArray0.8.0 checkmate1.9.1 scales1.0.0 stringr1.4.0 digest0.6.18 [29] Rsamtools1.34.1 foreign0.8-71 htmltools0.3.6 base64enc0.1-3 pkgconfig2.0.2 htmlwidgets1.3 rlang0.3.1 [36] GlobalOptions0.1.0 rstudioapi0.9.0 RSQLite2.1.1 shape1.4.4 bindr0.1.1 BiocParallel1.16.2 acepack1.4.1 [43] dplyr0.7.8 RCurl1.95-4.11 magrittr1.5 GenomeInfoDbData1.2.0 Matrix1.2-15 Rcpp1.0.0 munsell0.5.0 [50] stringi1.2.4 yaml2.2.0 MASS7.3-51.1 SummarizedExperiment1.12.0 zlibbioc1.28.0 plyr1.8.4 grid3.5.1 [57] blob1.1.1 crayon1.3.4 splines3.5.1 hash2.2.6 circlize0.4.5 hms0.4.2 knitr1.21 [64] ComplexHeatmap1.20.0 pillar1.3.1 rjson0.2.20 biomaRt2.38.0 XML3.98-1.17 glue1.3.0 latticeExtra0.6-28 [71] data.table1.12.0 BiocManager1.30.4 gtable0.2.0 purrr0.3.0 assertthat0.2.0 xfun0.4 tibble2.0.1 [78] GenomicAlignments1.18.1 memoise1.1.0 cluster2.0.7-1 bindrcpp_0.2.2

ADD REPLY
0
Entering edit mode

Dawid, do you want me test it on in mouse? If yes, could you please post the library loading code? Thanks! Best, Julie

ADD REPLY
1
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 14 months ago
United States

Dawid,

It turns out that one of the offtargets located at chr1:171228616-171228638 in mm10 has sequence NAGNGCAGGGGAGGGGAGGGGAG that contains two Ns. This type of mismatch does not have penalty score defined in the CFD score system.

I will update the code to simply remove offtargets with at least one N. Do you think this is a proper workaround? Thanks!

Best regards,

Julie

ADD COMMENT
0
Entering edit mode

Julie,

Thank you for finding this exception! I think your solution sounds good.

As a side questions 1) Can "Hsu-Zhang" scoring handle this type of mismatches? 2) Can "chromToExclude =" exclude specific sequence on the chromosome i.e. chr1:171,228,616-171,228,638

Thanks, Dawid

ADD REPLY
1
Entering edit mode

Dawid,

Thank you for your input! I am also thinking of another way which is to create N mismatch penalty scores by averaging different type of mismatches/matches for each nucleotide. I will see if this is not too cumbersome to implement.

Yes, "Hsu-Zhang" scoring method handles this type of mismatches fine

"chromToExclude =" can exclude specific chromosomes not specific sequences on the chromosome at this stage.

Best regards,

Julie

ADD REPLY

Login before adding your answer.

Traffic: 629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6