On Jun 11, 2015, at 7:36 AM, Gao, Xin (Daniel) <Xin.Gao@umassmed.edu> wrote:
Hi Julie,
Thank you very much for your great help! After running the codes under r, I found some points I still couldn't figure out by myself.
1)I ran results1 code to find gRNA. I noticed the code only works when I deleted "allowed.mismatch.PAM = 4,". I attach the case and error below.
Results1 <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,
findPairedgRNAOnly = FALSE,
BSgenomeName = Hsapiens, chromToSearch = "",PAM = "NNNNGATT", PAM.size=8, PAM.pattern="NNNNGATT$",
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,
orgAnn = org.Hs.egSYMBOL, max.mismatch = 3,
outputDir = outputDir, allowed.mismatch.PAM = 4,overwrite = TRUE)
Error in offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, :
unused argument (allowed.mismatch.PAM = 4)
2)I tried to use PAM="NNNNGHTT" (H=[A|C|T] as you told us before) and it worked, but it only worked when I wrote "PAM = "NNNNGHTT" even leaving the PAM.pattern="NNNNGATT$" unchanged. I originally thought I should change PAM pattern instead of changing PAM?
3)If I changed max.mismatch to 0 or 1 or 2, I had the same gRNA results all the time. I am thinking max.mismatch only works in offtarget() function and it has nothing to do with finding gRNA, right?
4)I am curious will the CRISPRseek work faster with gRNA.size=24, PAM="GATT"? I tried to modify this code a little but unfortunately it didn't work. Maybe there is a problem with internal searching parameter? If it is too complicated to modify codes, we can still stick to this NNNNGATT pattern.
5)I haven't worked on off-target analysis carefully since it takes time to get updated 1.9.1 version. But it seems I encounter the same error if I keep "allowed.mismatch.PAM = 4,".
Thank you again if you could answer these questions.
Sincerely,
Daniel
From: Zhu, Lihua (Julie)
Sent: Wednesday, June 10, 2015 7:49 PM
To: Gao, Xin (Daniel)
Cc: Sontheimer, Erik; Amrani, Nadia
Subject: Re: CRISPRseek to analyze NmCas9
Daniel,
Please see my answer to your question and code examples below given that you are interested in searching human genome. Attached are the analysis results of your sequences.
Best regards,
Julie
From: <Gao>, "Xin (Daniel)" <Xin.Gao@umassmed.edu>
Date: Wednesday, June 10, 2015 2:08 PM
To: Lihua Julie Zhu <julie.zhu@umassmed.edu>
Cc: "Sontheimer, Erik" <Erik.Sontheimer@umassmed.edu>, "Amrani, Nadia" <Nadia.Amrani@umassmed.edu>
Subject: RE: CRISPRseek to analyze NmCas9
Hi Julie,
Thank you for your quick reply and clear explanation!
1)The first question I have now is how to input the sequence instead of using the example sequence by writing proper codes. Now we are interested in searching gRNA at Chromosome 6 and 22. Please see the attached four sites we are interested in. I am also wondering is there any limitation on length of the input sequence, for example can we ask CRISPRseek to find all good targeting sites throughout chromosome6 or even the whole genome?
You first create a fasta file (plan text, see attached file as an example) and save the fasta file as inputSeq.fa in a directory, e.g., ~/CRISPRseek where ~ means your home directory
Then set the working directory to ~/CRISPRseek in R, and set inputFilePath and outputDir as follows.
setwd("~/CRISPRseek")
inputFilePath="~/Documents/ConsultingActivities/CRISPRseek/ErikSontheimer/inputSeq.fa"
outputDir <- getwd()
There is no limitation on length of the input sequence as long as you input the sequence as a fasta file. To just find gRNAs without off target analysis, it is doable with whole genome scan.
To find gRNAs without off target search, please set chromToSearch = ""
For example,
Results1 <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,
findPairedgRNAOnly = FALSE,
BSgenomeName = Hsapiens, chromToSearch = "",PAM = "NNNNGATT", PAM.size=8, PAM.pattern="NNNNGATT$",
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,
orgAnn = org.Hs.egSYMBOL, max.mismatch = 3,
outputDir = outputDir, allowed.mismatch.PAM = 4,overwrite = TRUE)
2)After finding the best targeting place, we plan to predict the off-target effects of this gRNA. We want to know the possible off-target sites across the whole genome. At this point, should I modify the codes to let offtarget() search whole genome not only at the input sequence?
To perform genome-wide off-target search for gRNAs in your input sequence, please set chromToSearch = "all", max.mismatch = 3 or a number you prefer.
Please note that I have customized the code to search for gRNAs with NNNNGATT as PAM sequence.
Results2 <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,
findPairedgRNAOnly = FALSE,
BSgenomeName = Hsapiens, chromToSearch = "all",PAM = "NNNNGATT", PAM.size=8, PAM.pattern="NNNNGATT$",
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,
orgAnn = org.Hs.egSYMBOL, max.mismatch = 3,
outputDir = outputDir, allowed.mismatch.PAM = 4,overwrite = TRUE)
Please find attached the analysis results allowing max.mismatch = 3. Please let me know if you spot any error. Thanks!
FYI, the above code for off target analysis only works with the development version of CRISPRseek . I have deposited the updated package (version 1.9.1) to Bioconductor site for you to download at http://bioconductor.org/packages/devel/bioc/html/CRISPRseek.html. It will take a couple of days for the updated package to become available.
Please do not hesitate to contact me if you need any clarification or help.
Thank you very much if you could work out the codes for us and answer my questions.
Daniel
From: Zhu, Lihua (Julie)
Sent: Wednesday, June 10, 2015 6:55 AM
To: Baehrecke, Eric
Cc: Gao, Xin (Daniel); Sontheimer, Erik
Subject: Re: CRISPRseek to analyze NmCas9
Whoops. Thank you very much, Eric!
Best,
Julie
On Jun 9, 2015, at 10:52 PM, Zhu, Lihua (Julie) <Julie.Zhu@umassmed.edu> wrote:
Daniel,
There is no gRNAs found that meet your PAM requirement inthe example sequence inputseq.fa provided by the software. Please remember to use your own sequence not the example sequence from the package for real search.
findgRNAs(inputFilePath = system.file("extdata","inputseq.fa", package = "CRISPRseek"),pairOutputFile = "testpairedgRNAs.xls",findPairedgRNAOnly = FALSE, PAM="NNNNGATT", PAM.size=8, gRNA.size = 20)
A DNAStringSet instance of length 0
Warning message:
In FUN(1L[[1L]], ...) : No gRNAs found in the input sequence Hsap_GATA1_ex2
To show that findgRNAs does find gRNAs with different PAM and different gRNA size , here is an example using the example sequence with PAM ="NNNNCAGG"
findgRNAs(inputFilePath = system.file("extdata","inputseq.fa", package = "CRISPRseek"),pairOutputFile = "testpairedgRNAs.xls",findPairedgRNAOnly = FALSE, PAM="NNNNCAGG", PAM.size=8, gRNA.size = 20)
A DNAStringSet instance of length 2
width seq names
[1] 28 CTCTGGTGTC...CAGAATCAGG Hsap_GATA1_ex2_gR34f
[2] 28 ATTCTGGTGT...CCAGAGCAGG Hsap_GATA1_ex2_gR25r
The hitsFile contains the results from a NGG search which is why you see NGG there. You should not need to use buildFeatureVectorForScoring since this function is called offTargetAnalysis function. The only function you need to use is offTargetAnalysis which calls all the other functions automatically.
hitsFile <- system.file("extdata", "hits.txt", package = "CRISPRseek")
hits <- read.table(hitsFile, sep= "\t", header = TRUE, stringsAsFactors = FALSE)
buildFeatureVectorForScoring(hits,gRNA.size=28,canonical.PAM="GATT")
Could you please send me the sequence in chr6 you are interested in search for gRNAs and I will work out the code for you? Thanks!
Best regards,
Julie
From: <Gao>, "Xin (Daniel)" <Xin.Gao@umassmed.edu>
Date: Tuesday, June 9, 2015 6:44 PM
To: Lihua Julie Zhu <julie.zhu@umassmed.edu>
Subject: CRISPRseek to analyze NmCas9
Hi Julie,
I am the graduate student from Erik Sontheimer's lab. We met once in Erik's office to discuss how to use CRISPRseek to analyze NmCas9. I don't have much computational background so the questions may be very naive. As you know, our PAM is "GATT" instead of "NGG". I tried to use the examples in the PDF files by modifying a few criteria but unfortunately I couldn't make it work. One big question I have now is I can't modify the internal criteria by changing PAM from NGG to GATT, gRNA.size from 20 to 28. So I couldn't search chromosome 6 to find potential gRNA by CRISPRseek (by doing this I could validate a few gRNAs we knew at chr6 to make sure CRISPRseek works in our case). The example I used according to your PDF is in red below:
Usage:
findgRNAs(inputFilePath, format = "fasta", PAM = "GATT", PAM.size = 4, findPairedgRNAOnly = FALSE, gRNA.pattern = "", gRNA.size = 28, overlap.gRNA.positions = c(21,22), min.gap = 0, max.gap = 24, pairOutputFile, name.prefix = "", featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek"), baseBeforegRNA = 4, baseAfterPAM = 3, calculategRNAEfficacy = FALSE, efficacyFile)
Example:
findgRNAs(inputFilePath = system.file("extdata","inputseq.fa", package = "CRISPRseek"),pairOutputFile = "testpairedgRNAs.xls",findPairedgRNAOnly = TRUE)
A DNAStringSet instance of length 2---the example can run under r but the result is under "NGG" criteria
Similar example is such as hitsFile <- system.file("extdata", "hits.txt", package = "CRISPRseek")
hits <- read.table(hitsFile, sep= "\t", header = TRUE, stringsAsFactors = FALSE)
buildFeatureVectorForScoring(hits,gRNA.size=28,canonical.PAM="GATT")
I hope you could provide me some valuable suggestions on how to writing the language. If it's easier for you to answer face-by-face, I could bring my laptop and visit your office whenever you are available. Thank you very much!
Sincerely,
Xin Gao (Daniel)
PhD Student
Graduate School of Biomedical Science
University of Massachusetts Medical School