Question

offTargetAnalysis takes super long

0

Entering edit mode

Dawid G. Nowak ▴ 40

@dawid-g-nowak-6790

Last seen 13 months ago

United States

Hi,

I am having recently problems with offTargetAnalysis() "Building feature vectors for scoring ... takes super long comparing to the previous runs and then I see this Error when I stop:

Error in unlist(lapply(1:dim(mismatch.pos)[1], function(i) { : 
  error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in subseq(subject[i], start = j, width = 1) : 
  error in evaluating the argument 'x' in selecting a method for function 'subseq': Error in get(seqname, envir = seqs_cache, inherits = FALSE) : 
  object 'chrUn_JH584304' not found

Then, I decided to use chromToExclude = "chrUn_JH584304" and then again I I get similar error but "chrUn_GL456396". It looks like an error is always on the last from the list ....., "chrUn_JH584304", "chrUn_GL456396" during building the score.

I would appreciate any help!

Thanks,
Dawid

My code:
offTargetAnalysis(inputFilePath,
                  REpatternFile = REpatternFile,
                  format = "fasta",
                  findgRNAsWithREcutOnly = FALSE, # if FALSE not restr. enzymes
                  findPairedgRNAOnly = FALSE,
                  gRNA.name.prefix = "g.",                 
                  orgAnn = org.Mm.egSYMBOL,
                  BSgenomeName = Mmusculus,
                  txdb = TxDb.Mmusculus.UCSC.mm10.knownGene,
                  chromToSearch="all", # change here for all to look at all chromosomes
                  chromToExclude = "chrUn_JH584304",
                  min.gap = 0, max.gap = 20,
                  max.mismatch = 3,
                  min.score = 0.1, 
                  topN = 100,
                  topN.OfftargetTotalScore= 10, # 10 top Offtarget will be calculated 
                  annotateExon = TRUE,
                  fetchSequence = TRUE, upstream = 250, downstream = 250,
                  overlap.gRNA.positions = c(17, 18), 
                  gRNA.pattern = "^G",
                  PAM.size = 3,
                  PAM = "NGG",
                  gRNA.size = 20, 
                  outputDir = outputDir,
                  overwrite = TRUE)

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] C

attached base packages:
[1] stats4    parallel  graphics  grDevices utils     datasets  stats     methods   base     

other attached packages:
 [1] org.Mm.eg.db_3.2.1                       RSQLite_1.0.0                            DBI_0.3.1                               
 [4] TxDb.Mmusculus.UCSC.mm10.knownGene_3.2.1 GenomicFeatures_1.21.30                  AnnotationDbi_1.31.18                   
 [7] Biobase_2.29.1                           BSgenome.Mmusculus.UCSC.mm10_1.4.0       CRISPRseek_1.9.9                        
[10] BSgenome_1.37.5                          rtracklayer_1.29.28                      seqinr_3.1-3                            
[13] ade4_1.7-2                               BiocInstaller_1.19.14                    Biostrings_2.37.8                       
[16] XVector_0.9.4                            GenomicRanges_1.21.29                    GenomeInfoDb_1.5.16                     
[19] IRanges_2.3.22                           S4Vectors_0.7.18                         BiocGenerics_0.15.6                     
[22] plyr_1.8.3                               ggplot2_1.0.1                           

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1                futile.logger_1.4.1        bitops_1.0-6               futile.options_1.0.0       tools_3.2.2               
 [6] zlibbioc_1.15.0            biomaRt_2.25.3             digest_0.6.8               gtable_0.1.2               proto_0.3-10              
[11] stringr_1.0.0              grid_3.2.2                 XML_3.98-1.3               BiocParallel_1.3.52        lambda.r_1.1.7            
[16] reshape2_1.4.1             magrittr_1.5               GenomicAlignments_1.5.18   scales_0.3.0               Rsamtools_1.21.18         
[21] MASS_7.3-44                SummarizedExperiment_0.3.9 colorspace_1.2-6           stringi_0.5-5              RCurl_1.95-4.7            
[26] munsell_0.4.2

CRISPRseek • 1.1k views

ADD COMMENT • link updated 8.5 years ago by Julie Zhu ★ 4.3k • written 8.6 years ago by Dawid G. Nowak ▴ 40

score 1 · Answer 1 · 2015-10-05

Dawid,

The following code works. What I have changed is to set chromToSearch = seqnames(Mmusculus)[1:22], the regular chromosomes chr1, chr2,… chrX, chrY and chrM.

I suggest you run offtargetAnalysis with max.mismatch = 0 first. Then evaluate the summary.xls file. As you will see 195/220 of the gRNAs, found in the input sequence, has multiple perfect matches in the genome. It is more efficient to select the gRNAs without multiple perfect matches in the genome to further search for off targets with larger number of mismatches.

Please let me know if it works for you. Thanks!

Best regards,

Julie

library(BSgenome.Mmusculus.UCSC.mm10)

library(TxDb.Mmusculus.UCSC.mm10.knownGene)

library(org.Mm.eg.db)

library(CRISPRseek)

results <- offTargetAnalysis(inputFilePath,

findgRNAsWithREcutOnly = FALSE, # if FALSE not restr. enzymes

findPairedgRNAOnly = FALSE,

gRNA.name.prefix = "g.",

BSgenomeName = Mmusculus,

orgAnn = org.Mm.egSYMBOL,

txdb = TxDb.Mmusculus.UCSC.mm10.knownGene,

chromToSearch = seqnames(Mmusculus)[1:22],

chromToExclude = "",

min.gap = 0, max.gap = 20,

max.mismatch = 3,

min.score = 0.1,

topN = 100,

topN.OfftargetTotalScore= 10, # 10 top Offtarget will be calculated

annotateExon = TRUE,

fetchSequence = TRUE, upstream = 250, downstream = 250,

overlap.gRNA.positions = c(17, 18),

gRNA.pattern = "^G",

PAM.size = 3,

PAM = "NGG",

gRNA.size = 20,

outputDir = outputDir,

overwrite = TRUE)

score 1 · Answer 2 · 2015-10-05

Dawid,

Two of the gRNAs PhLpp2_gR251r and gR254r, sharing the same gRNA sequence, happen to be the low complexity repeat. Therefore, there are lots of perfect matches to this gRNA and much more regions align to this gRNA if allowing for 3 mismatches. This causes the score matrix to explode. To combat this problem, I suggest use http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker program to mask repeats before performing gRNA search and off target analysis.

I tested the masked sequence below and it works. Please let me know if it works for you after set chromToSearch = seqnames(Mmusculus)[1:22] and use the repeat masked input sequence. Thanks!

Best regards,

Julie

##### here is the masked input

>Phlpp2

ATGGGGGAGGTGGAGCCCGTGCCCGCGGGCCCGCTGGAGCCCCCGGAGCC
ACCTGAAGCGGCGGCGCCTCGCCGGCCCGGAGGGATTCGGGTCCTAAAGA
GAAATATGAAACACAATGGGAGCAGAACTTGTTTGAATAGAAGAAGTAGG
TTTGGTTCCCGAGAAAGAGACTGGCTAAGAGAAGATGTGAAGAGAGGCTG
TGTTTACCTTTATGGAGCAGACACGACCACTGCCACTACAACCANNNNNN
NNNNNNNNNNNNNNNNNNNNNNTTCTGATTTACATCTTGTCCTTTGCACA
GTAGAGACACCAGCGTCAGAAATATGTGCTGGAGAGGGAAGAGAAAGCCT
CTATCTACAGCTTCATGGAGATCTGGTCAGGAGACTGGAGCCCTCTGAAC
GGCCTCTCCAGATTGTTTACGATTACTTATCCAGGCTGGGGTTTGAAGAT
CCCGTGCGCATACAGGAGGAGGCTACGAACCCTGACCTCAGCTGTATGAT
TCGATTTTATGGTGAAAAACCATGCCAGATGGATCATCTGGATCGAATCC
TACTGTCTGGCATCTATAATGTACGCAAAGGAAAAACCCAGCTGCACAAA
TGGGCTGAGCGCCTCGTTGTTCTCTGTGGTACCTGCCTTATTGTTTCCTC
AGTGAAGGATTGTCAAACTGGAAAGATGCACATTTTGCCGCTGGTTGGGG
GAAAGATAGAAGAAGTGAAGCGCCGGCAGCACTCCCTTGCTTTCAGCTCA

score 0 · Answer 3 · 2015-10-02

Dawid, Could you please try to quit R and restart the analysis? I am wondering whether the workspace has been contaminated. Also did you try to set chromToExclude = ""? If still not working, could you please send me the gRNA or input sequence used as inputFilePath? Thanks! Best regards, Julie From: "Dawid G. Nowak [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+0ab7ef43+code@bioconductor.org<mailto:reply+0ab7ef43+code@bioconductor.org>" <reply+0ab7ef43+code@bioconductor.org<mailto:reply+0ab7ef43+code@bioconductor.org>> Date: Friday, October 2, 2015 3:07 PM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] offTargetAnalysis takes super long Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Dawid G. Nowak<https: support.bioconductor.org="" u="" 6790=""/> wrote Question: offTargetAnalysis takes super long<https: support.bioconductor.org="" p="" 72908=""/>: Hi, I am having recently problems with offTargetAnalysis() "Building feature vectors for scoring ... takes super long comparing to the previous runs and then I see this Error when I stop: ________________________________ Error in unlist(lapply(1:dim(mismatch.pos)[1], function(i) { : error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in subseq(subject[i], start = j, width = 1) : error in evaluating the argument 'x' in selecting a method for function 'subseq': Error in get(seqname, envir = seqs_cache, inherits = FALSE) : object 'chrUn_JH584304' not found ________________________________ Then, I decided to use chromToExclude = "chrUn_JH584304" and then again I I get similar error but "chrUn_GL456396". It looks like an error is always on the last from the list ....., "chrUn_JH584304", "chrUn_GL456396" during building the score. I would appreciate any help! Thanks, Dawid ________________________________ My code: offTargetAnalysis(inputFilePath, REpatternFile = REpatternFile, format = "fasta", findgRNAsWithREcutOnly = FALSE, # if FALSE not restr. enzymes findPairedgRNAOnly = FALSE, gRNA.name.prefix = "g.", orgAnn = org.Mm.egSYMBOL, BSgenomeName = Mmusculus, txdb = TxDb.Mmusculus.UCSC.mm10.knownGene, chromToSearch="all", # change here for all to look at all chromosomes chromToExclude = "chrUn_JH584304", min.gap = 0, max.gap = 20, max.mismatch = 3, min.score = 0.1, topN = 100, topN.OfftargetTotalScore= 10, # 10 top Offtarget will be calculated annotateExon = TRUE, fetchSequence = TRUE, upstream = 250, downstream = 250, overlap.gRNA.positions = c(17, 18), gRNA.pattern = "^G", PAM.size = 3, PAM = "NGG", gRNA.size = 20, outputDir = outputDir, overwrite = TRUE) ________________________________ > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.5 (Yosemite) locale: [1] C attached base packages: [1] stats4 parallel graphics grDevices utils datasets stats methods base other attached packages: [1] org.Mm.eg.db_3.2.1 RSQLite_1.0.0 DBI_0.3.1 [4] TxDb.Mmusculus.UCSC.mm10.knownGene_3.2.1 GenomicFeatures_1.21.30 AnnotationDbi_1.31.18 [7] Biobase_2.29.1 BSgenome.Mmusculus.UCSC.mm10_1.4.0 CRISPRseek_1.9.9 [10] BSgenome_1.37.5 rtracklayer_1.29.28 seqinr_3.1-3 [13] ade4_1.7-2 BiocInstaller_1.19.14 Biostrings_2.37.8 [16] XVector_0.9.4 GenomicRanges_1.21.29 GenomeInfoDb_1.5.16 [19] IRanges_2.3.22 S4Vectors_0.7.18 BiocGenerics_0.15.6 [22] plyr_1.8.3 ggplot2_1.0.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.1 futile.logger_1.4.1 bitops_1.0-6 futile.options_1.0.0 tools_3.2.2 [6] zlibbioc_1.15.0 biomaRt_2.25.3 digest_0.6.8 gtable_0.1.2 proto_0.3-10 [11] stringr_1.0.0 grid_3.2.2 XML_3.98-1.3 BiocParallel_1.3.52 lambda.r_1.1.7 [16] reshape2_1.4.1 magrittr_1.5 GenomicAlignments_1.5.18 scales_0.3.0 Rsamtools_1.21.18 [21] MASS_7.3-44 SummarizedExperiment_0.3.9 colorspace_1.2-6 stringi_0.5-5 RCurl_1.95-4.7 [26] munsell_0.4.2 ________________________________ Post tags: CRISPRseek You may reply via email or visit offTargetAnalysis takes super long