Question

Major on-target peak not appearing in GUIDEseq results

0

Entering edit mode

David Kuo • 0

@880dbf81

Last seen 3.2 years ago

Hi Bioconductor List and GUIDEseq maintainers,

Thanks for publishing and maintaining the GUIDEseq Bioconductor package. I’ve run into a few issues that I’ve put in bullet-point form below. Any guidance that you can offer would be greatly appreciated:

The "on-target" sequence in the data I’m analyzing is not appearing as a hit
- This site has 100,000+ reads in the region and is the highest peak in the dataset but not appearing in the output data, even though it has no mismatches.
- When inspecting the BED outputs, I found that there were two flanking peaks around where the major peak was called but not the major peak.
The IGV screenshot below shows what I mean with the two GUIDEseq called peaks in blue on the bottom track but no peak was called in between where there is a large pileup of reads (on-target site)
I've set plus.strand.start.gt.minus.strand.end = FALSE and keepPeaksInBothStrandsOnly = FALSE in attempts to recover this peak but was not successful
What would be next parameters to try tweaking to recover the expected on-target peak?
While I'm testing a Cas9 dataset, I may be getting data from alternative nucleases. Is there support for alternative guide lengths, different PAM sequences and position weights?
I noticed that within GUIDESeq there is a vector of weights coming from a different paper. If we're trying a nuclease that does not have these weights determined, what would be the recommendation as input?

ontarget_igv_peak

Thanks in advance,

David

GUIDEseq • 652 views

ADD COMMENT • link updated 3.2 years ago by Julie Zhu ★ 4.3k • written 3.2 years ago by David Kuo • 0

score 3 · Accepted Answer · 2021-02-16

Hi David,

Please see my response below each of your questions.

Q1. I've set plus.strand.start.gt.minus.strand.end = FALSE and keepPeaksInBothStrandsOnly = FALSE in attempts to recover this peak but was not successful What would be next parameters to try tweaking to recover the expected on-target peak?

Response:

You can try to modify the following parameters.

distance.threshold

max.overlap.plusSig.minusSig

bg.window.size

min.reads

min.reads.per.lib

min.peak.score.1strandOnly

min.SNratio

maxP = 0.01

If you use the workflow function GUIDEseqAnalysis and save the results as guideSeqRes, could you please take a look at the following objects and see if the region of your interest is captured in all of the objects or a subset of the objects?

guideSeqRes$peaks

guideSeqRes$merged.peaks

guideSeqRes$uniqueCleavages

If you still cannot get the peaks, could you please post your script and sessionInfo? Thanks!

Q2. While I'm testing a Cas9 dataset, I may be getting data from alternative nucleases. Is there support for alternative guide lengths, different PAM sequences and position weights? I noticed that within GUIDESeq there is a vector of weights coming from a different paper. If we're trying a nuclease that does not have these weights determined, what would be the recommendation as input?

Response:

For SpCas9 system, I recommend you set scoring.method = "CFDscore". For alternative nucleases, you need to set parameters related to gRNA length, PAM, and weights accordingly. For detailed information, please take a look at example 2 and 3 at https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3746-y.

If you do not have the weights for your nucleases, you can pad the vector weights to have the same length as the gRNA length.

FYI, scoring.method and weights will only affect the predicted cleavage score in the predicted_cleavage_score column in the final peak file. In other words, the settings of these parameters will not affect peak calling.

Best regards,

Julie