CRISPRseek: annotate gene
7
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 13 months ago
United States

Peter,

Could you please post your future question at the official support site at https://support.bioconductor.org/ for others to benefit/contribute? Thanks!

If you could try to download the dev version of CRISPRseek and see if you encounter the same problem, that would be great! Thanks!

 http://www.bioconductor.org/packages/devel/bioc/html/CRISPRseek.html

Best,

Julie


On 10/27/14 1:02 PM, "Peter Waltman" <pwaltman@mail.rockefeller.edu> wrote:
 

 Hi Lihua -
 
 Thanks for putting together this package!  During testing of it, I've found that I'm getting an error thrown in the filterOffTargets function.
 
 Specifically, when using the fasta file that I've attached, and calling the offTargetAnalysis function below, I get the following error message:
 
 call to offTargetAnalysis:
 

unpaired.no_re.results <- offTargetAnalysis( "../test.fa", format="fasta", findgRNAs=TRUE, exportAllgRNAs=c("all", "fasta", "genbank", "no"), findgRNAsWithREcutOnly=FALSE, minREpatternSize=6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly=FALSE, min.gap=0, max.gap=20, gRNA.name.prefix="gRNA", PAM.size=3, gRNA.size=20, PAM="NGG", BSgenomeName = Hsapiens, chromToSearch="all", max.mismatch = 3, PAM.pattern = "N[A|G]G$", gRNA.pattern = "", min.score = 0.5, topN = 100, topN.OfftargetTotalScore = 10, annotateExon = TRUE, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, outputDir="./unpaired.grnas.no_re.max_3", fetchSequence = TRUE, upstream = 200, downstream = 200, orgAnn = org.Hs.egSYMBOL)
 


 output:
 

...
 >>> DONE searching
 >>> Finding all hits in sequence chrUn_gl000241 ...
 >>> DONE searching
 >>> Finding all hits in sequence chrUn_gl000242 ...
 >>> DONE searching
 >>> Finding all hits in sequence chrUn_gl000243 ...
 >>> DONE searching
 >>> Finding all hits in sequence chrUn_gl000244 ...
 >>> DONE searching
 >>> Finding all hits in sequence chrUn_gl000245 ...
 >>> DONE searching
 >>> Finding all hits in sequence chrUn_gl000246 ...
 >>> DONE searching
 >>> Finding all hits in sequence chrUn_gl000247 ...
 >>> DONE searching
 >>> Finding all hits in sequence chrUn_gl000248 ...
 >>> DONE searching
 >>> Finding all hits in sequence chrUn_gl000249 ...
 >>> DONE searching
 Building feature vectors for scoring ...
 Calculating scores ...
 Annotating, filtering and generating reports ...
  [1] "55870" "79136" "79136" "7920"  "79136" "79136" "7920"  "79136" "79136" "7920"  "79136" "79136" "7920"  "79136"
 [15] "79136" "7920"  "79136" "79136" "7920"  "79136" "79136" "7920"
 Error in this.score$symbol[query.ind] = overlapGenes.symbol :
   NAs are not allowed in subscripted assignments
 >
 > traceback()
 2: filterOffTarget(scores = scores, outputDir = outputDir, BSgenomeName = BSgenomeName,
        fetchSequence = fetchSequence, txdb = txdb, orgAnn = orgAnn,
        min.score = min.score, topN = topN, topN.OfftargetTotalScore = topN.OfftargetTotalScore,
        upstream = upstream, downstream = downstream, annotateExon = annotateExon,
        baseBeforegRNA = baseBeforegRNA, baseAfterPAM = baseAfterPAM,
        featureWeightMatrixFile = featureWeightMatrixFile)
 1: offTargetAnalysis("../test.fa", format = "fasta", findgRNAs = TRUE,
        exportAllgRNAs = c("all", "fasta", "genbank", "no"), findgRNAsWithREcutOnly = FALSE,
        minREpatternSize = 6, overlap.gRNA.positions = c(17, 18),
        findPairedgRNAOnly = FALSE, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA",
        PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens,
        chromToSearch = "all", max.mismatch = 3, PAM.pattern = "N[A|G]G$",
        gRNA.pattern = "", min.score = 0.5, topN = 100, topN.OfftargetTotalScore = 10,
        annotateExon = TRUE, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,
        outputDir = "./unpaired.grnas.no_re.max_3", fetchSequence = TRUE,
        upstream = 200, downstream = 200, orgAnn = org.Hs.egSYMBOL)
 

I've traced the error to the following line (including my debug code):
             tryres <- try( this.score$symbol[query.ind] <- overlapGenes.symbol )
                         if (class(tryres)=="try-error" ) browser()
 
 Although I'm not sure if I have the expertise to be able to determine why query.ind would be set to NA in that case.
 
 Thanks,
 
 Peter Waltman

CRISPRseek • 2.2k views
ADD COMMENT
0
Entering edit mode

Peter,

Could you please send me the input sequence file so that I can try to replicate the error? Also could you please let me know the version of CRISPRseek you used? Thanks!  

FYI, the most recent version is 1.7.10.  http://www.bioconductor.org/packages/devel/bioc/html/CRISPRseek.html

Best regards,

Julie

ADD REPLY
0
Entering edit mode
@dawid-g-nowak-6790
Last seen 20 months ago
United States

Hi Julia,

I posted this also in the post about not generating Summary file. I experience the same error. I tried dev version etc. it didnt help.

I would appreciate your help!

Best,
Dawid

Error in this.score$symbol[query.ind] = overlapGenes.symbol : 
  NAs are not allowed in subscripted assignments
ADD COMMENT
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 13 months ago
United States

Dawid,

Could you please send me the input sequence file so that I can try to replicate the error? Also could you please let me know the version of CRISPRseek you used? Thanks!  

FYI, the most recent version is 1.7.10. 

Best regards,

Julie

ADD COMMENT
0
Entering edit mode
@dawid-g-nowak-6790
Last seen 20 months ago
United States

Hi Julie,

I tested couple different sequences and I see Summary.xls is created but doesn't contain any data. I see this with ver. 1.7.10. and the most recent one 1.8.10. 

Best,

Dawid

ADD COMMENT
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 13 months ago
United States

Dawid,

Could you please send me the test sequences so that I can replicate the issue? Thanks for the feedback!

Best regards,

Julie

ADD COMMENT
0
Entering edit mode
@dawid-g-nowak-6790
Last seen 20 months ago
United States

Julie,

Just to sum up, I cannot see the data in Summary.xls, other files including OfftargetAnalysis.xls etc. are OK.

Thanks,
Dawid

Below details of my analysis. 
### sequence tested
>Test
ATGGGGACGGCGCTGGTCCAGCGCGGGGGCTGCTGTCTCCTCTGCCTGTCGCTGCTGCTGCTGGGCTGCTGGGCAGAGCTGGGCAGCGGGCTGGAGTTCCCGGGCGCCGAGGGCCAGTGGACGCGCTTCCCCAAGTGGAACGCGTGCTGCGAGAGCGAGATGAGCTTCCAGCTGAAGACGCGCAGTGCCCGCGGCCTCGTGCTCTACTTCGACGACGAGGGCTTCTGCGACTTCCTCGAGCTCATCCTGA

### code
offTargetAnalysis(inputFilePath,
                  REpatternFile = REpatternFile,
                  format = "fasta",
                  findgRNAsWithREcutOnly = FALSE, # if FALSE not restr. enzymes
                  findPairedgRNAOnly = FALSE, # paired only turn to FALSE if only WT Cas9 not Nickase
                  gRNA.name.prefix = "g.",                
                  orgAnn = org.Mm.egSYMBOL,
                  BSgenomeName = Mmusculus,
                  txdb = TxDb.Mmusculus.UCSC.mm10.knownGene,
                  chromToSearch="chr19",
                  min.gap = 0, max.gap = 20,
                  max.mismatch = 5,
                  min.score = 0.1,
                  topN = 100,
                  topN.OfftargetTotalScore= 10, # 10 top Offtarget
                  annotateExon = TRUE,
                  fetchSequence = TRUE, upstream = 250, downstream = 250,
                  overlap.gRNA.positions = c(17, 18),
                  outputDir = outputDir,
                  overwrite = TRUE)

### system info
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

locale:
[1] C

attached base packages:
[1] stats4    parallel  graphics  grDevices utils     datasets  stats     methods   base     

other attached packages:
 [1] org.Mm.eg.db_3.1.2                       RSQLite_1.0.0                            DBI_0.3.1                               
 [4] TxDb.Mmusculus.UCSC.mm10.knownGene_3.1.2 GenomicFeatures_1.20.1                   AnnotationDbi_1.30.1                    
 [7] Biobase_2.28.0                           BSgenome.Mmusculus.UCSC.mm10_1.4.0       CRISPRseek_1.8.1                        
[10] BSgenome_1.36.0                          rtracklayer_1.28.4                       seqinr_3.1-3                            
[13] ade4_1.7-2                               Biostrings_2.36.1                        XVector_0.8.0                           
[16] GenomicRanges_1.20.4                     GenomeInfoDb_1.4.0                       IRanges_2.2.3                           
[19] S4Vectors_0.6.0                          BiocGenerics_0.14.0                      biomaRt_2.24.0                          
[22] plyr_1.8.2                               ggplot2_1.0.1                           

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6             futile.logger_1.4.1     bitops_1.0-6            futile.options_1.0.0    tools_3.2.0             zlibbioc_1.14.0        
 [7] digest_0.6.8            gtable_0.1.2            proto_0.3-10            stringr_1.0.0           grid_3.2.0              XML_3.98-1.2           
[13] BiocParallel_1.2.2      lambda.r_1.1.7          reshape2_1.4.1          magrittr_1.5            scales_0.2.4            Rsamtools_1.20.4       
[19] MASS_7.3-40             GenomicAlignments_1.4.1 colorspace_1.2-6        stringi_0.4-1           RCurl_1.95-4.6          munsell_0.4.2    

ADD COMMENT
0
Entering edit mode
@dawid-g-nowak-6790
Last seen 20 months ago
United States

Julie,

I just assigned a name to  my offTargetAnalysis run and then I was able to access data from the created list and save them as a data frame (Summary.xls style). Before, I was always receiving Summary.xls directly in the outputDirectory without this additional step. Not sure if I am doing something wrongly, it would be great if you can look at this!

Thanks,
Dawid 

ADD COMMENT
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 13 months ago
United States

Dawid,

I tried to run your code. Although for me the summary.xls was created, it is an empty file. It turns out that you searched chr19 and there is no perfect target on chr19 for the gRNAs in your input test file.  Summary file only contains a list of gRNAs with at least one perfect target in the genome/chromosome you search.  If you change the chromToSearch = "all" which is the default, or the chromosome where the test sequence is located, you should see the correct summary.xls.  Does it make sense? 

Best regards,

Julie

ADD COMMENT

Login before adding your answer.

Traffic: 757 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6