Manipulating random CRISPR sequences
I'm working on creating a small workflow to look at random CRISPR guide sequences. Essentially I'm generating all of the putative CRISPR locations on a particular chromosome and would like to be able to manipulate these a little bit. Ultimately my goal is to find CRISPRs that cut at multiple locations in the genome. I know these may not occur frequently but it certainly occurs in repetitive genome locations like rDNA. I've managed to get r to give me the locations and sequences of all the locations in chromosome 1, but that's where I'm stuck.
1) how can I force the output "table" to be written in a tab-delimited format that could theoretically go into excel it it wasn't so huge? I've toyed around with the data.frame and writeTable commands, but haven't had much success. These are a it confusing for a beginner
2) Can I take the output and force r to find those sequences that are duplicated (i.e. the far right column)? Can it bin them into groups depending on the number of times a particular pattern is repeated?
3) Since this should be a more manageable list, how do I send the output of these duplicated sequences to a tab-delimited file? In other words, can I essentially create a setup where I have a list of CRISPR guide sequences that are repeated 2 or more times on this particular chromosome?

4) Can I expand this to work on the whole genome (I tried to simplify to start).

The Script:

    p1="nnnnnnnnnnnnnnnnnnnnngg"

library(BSgenome.Hsapiens.UCSC.hg38)

chr1<-Hsapiens[["chr1"]]

allsites<-matchPattern(p1, chr1, fixed="subject")

allsites

The output:

    Views on a 248956422-letter DNAString subject
subject: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
views:
start       end width
[1]     10451     10473    23 [AACCCTAACCCTAACCCTCGCGG]
[2]     10464     10486    23 [ACCCTCGCGGTACCCTCAGCCGG]
[3]     10477     10499    23 [CCTCAGCCGGCCCGCCCGCCCGG]
[4]     10478     10500    23 [CTCAGCCGGCCCGCCCGCCCGGG]
[5]     10490     10512    23 [GCCCGCCCGGGTCTGACCTGAGG]
...       ...       ...   ... ...
[12491308] 248946388 248946410    23 [AGGGTTAGGGTTAGGGTTAAGGG]
[12491309] 248946393 248946415    23 [TAGGGTTAGGGTTAAGGGTTAGG]
[12491310] 248946394 248946416    23 [AGGGTTAGGGTTAAGGGTTAGGG]
[12491311] 248946399 248946421    23 [TAGGGTTAAGGGTTAGGGTTAGG]
[12491312] 248946400 248946422    23 [AGGGTTAAGGGTTAGGGTTAGGG]
Shane,

2.6 Scenario 6. Quick gRNA finding without target or off-target analysis

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0108424

Best regards,

Julie

Hi Julie,

Just a quick question: the CRISPRseek user's guide mainly talked about how to use the package for gRNA. Could you please instruct on how to use it for Guide-seq as you've done at http://mccb.umassmed.edu/GUIDE-seq/ for the python package?

Thanks!

-- Mo

Hi Julie,

GUIDEseq successfully installed. However, the 1st step requires .bed & .bam files as input, while all we have are .fastq raw data files from HiSeq.

How should I proceed?

Thanks!

-- Mo

Please take a look at http://mccb.umassmed.edu/GUIDE-seq/readme.txt. BTW, please use a different tag for GUIDEseq question. Thanks!
OK. Started a new thread with GUIDE-seq in the question title. Thanks.

