mismatch & replacement
2
0
Entering edit mode
@danielbernerunibasch-4268
Last seen 3.1 years ago
Hi list 1. I have a large fastq file containing solexa reads that start with a barcode (identifier to separate individuals). I now want to filter that large data set according to the barcodes using ShortRead. I understand that this is easily done with grep() when one wants a perfect barcode match. However, I want to allow ONE single wrong nucleotide within the barcode, at any position. Is there an efficient way to filter by barcode while allowing a mismatch? 2. Is there a way to modify nucleotides in ShortRead objects? E.g., to replace a G by an A at position 3 for ALL sequences in the object? Thanks! Daniel
ShortRead ShortRead • 1.2k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States
On 11/05/2010 01:54 PM, Daniel.Berner at unibas.ch wrote: > Hi list > 1. I have a large fastq file containing solexa reads that start with a > barcode (identifier to separate individuals). I now want to filter that > large data set according to the barcodes using ShortRead. I understand > that this is easily done with grep() when one wants a perfect barcode > match. However, I want to allow ONE single wrong nucleotide within the > barcode, at any position. Is there an efficient way to filter by barcode > while allowing a mismatch? > > 2. Is there a way to modify nucleotides in ShortRead objects? E.g., to > replace a G by an A at position 3 for ALL sequences in the object? Hi Daniel -- a strategy is to narrow() the reads to the region of the bar code, and then countPDict(<narrowed seqs="">, DNAString(<barcode>), max.mismatch=1L) != 0, or vcountPDict. I think part 2 is along the lines of idx = as.character(subseq(dna, 3, 3)) == "G" subseq(dna[idx], 3, 3) = "A" though I suspect that character conversion isn't necessary. Martin > > Thanks! > Daniel > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
@harris-a-jaffee-3972
Last seen 9.4 years ago
United States
This example illustrates another approach to the first question. You'll need to post-process using the width of the value if you need to delete or select the barcoded reads. > trimLRPatterns(Lpattern="ACGT", subject=c("ACTTAA", "TTTTGG"), max.Lmismatch=1) [1] "AA" "TTTTGG" > trimLRPatterns(Lpattern="ACGT", subject=c("ACTTAA", "TTTTGG"), max.Lmismatch=1, ranges=TRUE) IRanges of length 2 start end width [1] 5 6 2 [2] 1 6 6 You can also use agrep with max.distance=1, but you will need to narrow to the barcode region of each read first (you can't employ "^" as a meta- character). -Harris On Nov 5, 2010, at 4:54 PM, Daniel.Berner at unibas.ch wrote: > Hi list > 1. I have a large fastq file containing solexa reads that start > with a barcode (identifier to separate individuals). I now want to > filter that large data set according to the barcodes using > ShortRead. I understand that this is easily done with grep() when > one wants a perfect barcode match. However, I want to allow ONE > single wrong nucleotide within the barcode, at any position. Is > there an efficient way to filter by barcode while allowing a mismatch? > > 2. Is there a way to modify nucleotides in ShortRead objects? E.g., > to replace a G by an A at position 3 for ALL sequences in the object? > > Thanks! > Daniel > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 691 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6