Hi, Im working with a 4c-seq experiment and one of the issues with this technique is that you have to delete reads that originated adjacent from the sequence of interest, I have the sequence corresponding to these reads, I need to delete these reads, till now I only found the reads that have this sequence using vmatchpattern but how can i delete these reads and create a new fastq archive without them?
Thank you very much for your help!

Looks like this is a COMMENT rather than Answer; please use the 'Add comment' button for comments.
To get some example data, I entered the following into my R session
This creates a variable that points to a small fastq file; I read it in and extracted the short read sequences, just to get a look
> sr = readFastq(fl) > sread(sr) A DNAStringSet instance of length 256 width seq [1] 36 GGACTTTGTAGGATACCCTCGCTTTCCTTCTCCTGT [2] 36 GATTTCTTACCTATTAGTGGTTGAACAGCATCGGAC [3] 36 GCGGTGGTCTATAGTGTTATTAATATCAATTTGGGT [4] 36 GTTACCATGATGTTATTTCTTCATTTGGAGGTAAAA [5] 36 GTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTT ... ... ... [252] 36 GTTTAGATATGAGTCACATTTTGTTCATGGTAGAGT [253] 36 GTTTTACAGACACCTAAAGCTACATCGTCAACGTTA [254] 36 GATGAACTAAGTCAACCTCAGCACTAACCTTGCGAG [255] 36 GTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTA [256] 36 GCAATCTGCCGACCACTCGCGATTCAATCATGACTTSuppose I wanted to get rid of sequences with "TTACC". I could use the grepl() function to find the reads that do not contain this pattern, and subset the original reads to get those that I want to keep
> sr[!grepl("TTACC", sread(sr))] class: ShortReadQ length: 244 reads; width: 36 cyclesSo it looks like there are 244 reads that satisfy my criterion. Now write a function that does this, and test...
fun <- function(x) x[ !grepl("TTACC", sread(x)) ]Verify that it works
Then use it in the filterFastq function, creating a new fastq file with the filtered results
> out <- tempfile() > filterFastq(fl, out, filter=fun) [1] "/var/folders/yn/gmsh_22s2c55v816r6d51fx1tnyl61/T//Rtmpnfw9lM/file172496fd16a17" attr(,"filter") Reads KeptReads Nucl KeptNucl s_1_sequence.txt 256 244 9216 8784verify that the output file contains the correct number of sequences
Thank you very much! This was super useful for me :D