Question: CRISPRseek ERROR, can't read featureWeightMatrix in, wrong data format
0
19 days ago by
nils.hassel0 wrote:

Hello everyone,

i do some work with LbCpf1 for my bachelor thesis and try to find gRNAs with an OffTargetAnalysis in the package CRISPRseek. I get some results, but i don't get a score or a validation of the gRNA's.

My Code:

library(CRISPRseek)
library(myBSgenome)

outputDir <- myWd
inputFilePath <- my.fasta
REpatternFile <- system.file('extdata', 'NEBenzymes.fa', package = 'CRISPRseek')

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, findgRNAs = TRUE,
REpatternFile = REpatternFile,findPairedgRNAOnly = FALSE,
annotatePaired = TRUE, enable.multicore = FALSE, n.cores.max = 1, min.gap = 0,
max.gap = 20,
gRNA.name.prefix = "", PAM.size = 4, gRNA.size = 23, PAM = "TTTN",
BSgenomeName = myBSgenome, chromToSearch = "all",
max.mismatch = 3, PAM.pattern = "^TTTV", allowed.mismatch.PAM = 1,
annotateExon = FALSE,
weights = c(0, 0, 0, 0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445,
0.508, 0.613,
0.851, 0.732, 0.828, 0.615,
0.804, 0.685, 0.583),
baseBeforegRNA = 22, baseAfterPAM = 28, subPAM.position = c(1, 2),
PAM.location = "5prime",
outputDir = outputDir,
overwrite = TRUE)


For this Code i get a summary of the gRNAs, paired gRNAs and the RECutDetails. The following Warning occured:

Warning message:

In searchHits2(gRNAs = gRNAs, PAM = PAM, PAM.pattern = PAM.pattern,  :
sure you are using the right genome. You can also alter your
search criteria such as increasing max.mismatch!


I don' t have the Target-Sequence in my BSGenome file for now, so i think I can ignore this Warning.

Because i want to have a score for the gRNAs I tried the following code:

rm(list = ls())
setwd("C:/Users/Nils/Documents/Uni/Unterlagen/6.Semester/CRISPRseek")

library(CRISPRseek)
library(myBSgenome)

outputDir <- myWD
inputFilePath <- my.fasta
REpatternFile <- system.file('extdata', 'NEBenzymes.fa', package = 'CRISPRseek')
featureWeightMatrixFile <- system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek")

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, findgRNAs = TRUE,
REpatternFile = REpatternFile,findPairedgRNAOnly = FALSE,
annotatePaired = TRUE, enable.multicore = FALSE, n.cores.max = 1, min.gap = 0,
max.gap = 20,
gRNA.name.prefix = "", PAM.size = 4, gRNA.size = 23, PAM = "TTTN",
BSgenomeName = myBSgenome, chromToSearch = "all",
max.mismatch = 3, PAM.pattern = "^TTTV", allowed.mismatch.PAM = 1,
annotateExon = FALSE,
weights = c(0, 0, 0, 0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389,
0.079, 0.445, 0.508,
0.613, 0.851, 0.732, 0.828, 0.615,
0.804, 0.685, 0.583),
baseBeforegRNA = 22, baseAfterPAM = 28, subPAM.position = c(1, 2),
PAM.location = "5prime",
featureWeightMatrix, useScore = TRUE, useEfficacyFromInputSeq = FALSE,
outputUniqueREs = TRUE,
foldgRNAs = FALSE,
gRNA.backbone="GUUUUAGAGCUAGAAAUAGCAAGUUAAA
AUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUU", temperature = 37,
scoring.method = c("Hsu-Zhang", "CFDscore"),
subPAM.activity = hash(
AA =0,
AC = 0,
AG = 0.259259259,
AT = 0,
CA = 0,
CC = 0,
CG = 0.107142857,
CT = 0,
GA = 0.069444444,
GC = 0.022222222,
GG = 1,
GT = 0.016129032,
TA = 0,
TC = 0,
TG = 0.038961039,
TT = 0),
outputDir = outputDir, overwrite = TRUE)


I get the following ERROR:

Error in findgRNAs(inputFilePath, findPairedgRNAOnly = findPairedgRNAOnly,  :
format needs to be either fasta or fastq !
In if (format == "bed") { :
the condition has length > 1 and only the first element will be used


I checked the Code and it works except of "featureweightmatrix".

I'm sorry for my bad englisch and bad knowledge in Bioinformatics. I would appreciate your help and hope you can help me to get a score or validation for my gRNAs.

Best regards,

Nils

crisprseek software error • 131 views
modified 13 days ago • written 19 days ago by nils.hassel0
3
19 days ago by
Julie Zhu4.1k
United States
Julie Zhu4.1k wrote:

Nils,

Please try to set baseBeforegRNA = 8, baseAfterPAM = 23 for Cpf1. If your input sequence is not expected to be present in the genome, please set useEfficacyFromInputSeq = TRUE. In addition, the function expects featureWeightMatrix to be a file path. To use DoenchNBT2014.csv included in the package, please set the parameter as the following. featureWeightMatrix = featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek").

It will be more accurate if you input a Cpf1-specific feature weight matrix. I am currently trying to incorporate the prediction algorithm from Kim et al. Nature Biotechnology volume 36, pages 239–241 2018.

Best,

Julie

Thank you very much for your fast answer. The Code works now. I get the gRNAefficacy.xls now. I get the same warning Message about search2hits again, but now it should be fine not to have OffTargets, right ?

my Code:

 library(CRISPRseek)
library(myBSgenome)

outputDir <- myDir
inputFilePath <- my.fasta
REpatternFile <- system.file('extdata', 'NEBenzymes.fa', package = 'CRISPRseek')

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, findgRNAs = TRUE,
REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE,
annotatePaired = TRUE, enable.multicore = FALSE, n.cores.max = 1, min.gap = 0, max.gap = 20,
gRNA.name.prefix = "", PAM.size = 4, gRNA.size = 23, PAM = "TTTV",
BSgenomeName = myBSgenome, chromToSearch = "all",
max.mismatch = 3, PAM.pattern = "^TTTV", allowed.mismatch.PAM = 1, annotateExon = FALSE,
weights = c(0, 0, 0, 0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583),
baseBeforegRNA = 8, baseAfterPAM = 23, subPAM.position = c(1, 2), PAM.location = "5prime",
featureWeightMatrix = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek"),
useScore = TRUE, useEfficacyFromInputSeq = TRUE, outputUniqueREs = TRUE,
foldgRNAs = FALSE,
gRNA.backbone="GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUU", temperature = 37,
scoring.method = c("Hsu-Zhang", "CFDscore"),
subPAM.activity = hash(
AA =0,
AC = 0,
AG = 0.259259259,
AT = 0,
CA = 0,
CC = 0,
CG = 0.107142857,
CT = 0,
GA = 0.069444444,
GC = 0.022222222,
GG = 1,
GT = 0.016129032,
TA = 0,
TC = 0,
TG = 0.038961039,
TT = 0),
outputDir = outputDir, overwrite = TRUE)


Thank you very much for your help and work on the Cpf1 feature weight matrix. I think it is accurate enough for me at this moment.

Best regards and greetings from Germany,

Nils

1

Hi Nils, Great to know that you have successfully obtained the efficacy prediction score. Yes, the warning Message about search2hits means that there is no offtarget found for your searching criteria.

You could try to set allowed.mismatch.PAM = 2 and PAM.pattern = "^TNNN" if you would like to find any offtargets with less stringent PAM pattern.

Best regards,

Julie

Nils

Thank you very much for our fast answer. I am going to look out for more OffTargets to stay safe and have a good time in the lab. I am really sorry, but i want to ask you a last question. Is it possible to use the command foldgRNAs ? In the description it says that you need the package geneRfold for this function. But you just get the geneRfold package for R2.15.0 and CRISPRseek for R (>= 3.0.1).

Best regards,

Nils

1

Hi Nils,

Unfortunately, geneRfold has been deprecated. Please let me know if you really need to fold the gRNAs, and I will share a workaround.

BTW, please set subPAM.activity for Cpf1. For example, set TT = 1, GG = 0, CC =0, AA =0, AC =0.

Best regards,

Julie

Hi Julie,

it is not that important for me to fold the gRNAs. I just thought it would be nice to have it, if i just need to set a command to true. I checked the whole crRNAs with the oligocalc-tool (http://biotools.nubic.northwestern.edu/OligoCalc.html) for unwanted hairpins and my supervisor is fine with it. It would be nice if you can share me the workaround, when you share it anyway. But for now i am going to order the templates and start in the lab. I will tell you about my progress. Thank you for the hint to set my subPAMactivity.

Best regards,

Nils

1

Hi Nils,

If you would like to fold the gRNAs, you could try this. Step 1. install most recent version of Viennerna package Step 2. install geneR in bioconductor (first download the source code, then use install.package in R to install it, e.g., geneR2.22.0.tar.gz) Step 3. install GeneRfold (download the source code and install it in R using install.packages('GeneRfold1.10.0.tar.gz', type = 'source') Step 4. download the dev version of CRISPRseek 1.27.3 and try to set foldgRNAs = TRUE. It might take a couple days for the updated version to be available at http://bioconductor.org/packages/devel/bioc/html/CRISPRseek.html

FYI, both GeneRfold and geneR are available at http://www.bioconductor.org/packages/2.8/BiocViews.html#_Genetics

Best regards,

Julie

Hi Julie, i was very busy in the lab in the last time, i am very sorry for my late response. I followed your instructions and installed ViennaRNA and geneR on my computer. But unfortunately i get an ERROR if i try to install GeneRfold from source.

install.packages("mywayto/GeneRfold_1.10.0.tar.gz", repos = NULL, type = 'source')


ERROR:

* installing *source* package 'GeneRfold' ...
** using staged installation

**********************************************
WARNING: this package has a configure script
It probably needs manual configuration
**********************************************

** libs

*** arch - i386
C:/Rtools/mingw_32/bin/gcc  -I"C:/Users/Nils/DOCUME~1/PROGRA~1/R/R-36~1.1/include" -DNDEBUG          -O3 -Wall  -std=gnu99 -mtune=generic -c viennaR.c -o viennaR.o
In file included from C:/Users/Nils/DOCUME~1/PROGRA~1/R/R-36~1.1/include/ViennaRNA/fold.h:6:0,
from viennaR.c:11:
C:/Users/Nils/DOCUME~1/PROGRA~1/R/R-36~1.1/include/ViennaRNA/datastructures/basic.h:42:41: fatal error: ViennaRNA/params/constants.h: No such file or directory
#include <ViennaRNA/params/constants.h>
^
compilation terminated.
make: *** [C:/Users/Nils/DOCUME~1/PROGRA~1/R/R-36~1.1/etc/i386/Makeconf:208: viennaR.o] Error 1
ERROR: compilation failed for package 'GeneRfold'
* removing 'C:/Users/Nils/Documents/Programme/R/R-3.6.1/library/GeneRfold'
Warning in install.packages :
installation of package ‘C:/Users/Nils/Desktop/GeneRfold_1.10.0.tar.gz’ had non-zero exit status


Maybe i can download the needed source files (https://www.tbi.univie.ac.at/RNA/ViennaRNA/doc/html/fold8hsource.html ) and manually add it to the directory. But i think there could be a smarter way.

Thank you for updating the package and making everything possible.

Best regards,

Nils

I am using Windows 10 as operating system and installed ViennaRNA as precompiled binary pakage. Should i try to install it from source ?

Nils,

I am sorry to let you know that it is possible that the new Bioconductor release no longer supports geneRfold installation in window.

Best regards,

Julie

Hi Julie,

i made a Linux subsystem on my computer and tried to make things work on it, but not with that much success. I used the online tool from vienna to get 2ndary structures for my gRNAs and at this point i am fine with it. Thank you very much for your great support.

Best regards,

Nils