mismatch & replacement

0

Entering edit mode

Daniel.Berner@unibas.ch ▴ 90

@danielbernerunibasch-4268

Last seen 4.7 years ago

Hi list 1. I have a large fastq file containing solexa reads that start with a barcode (identifier to separate individuals). I now want to filter that large data set according to the barcodes using ShortRead. I understand that this is easily done with grep() when one wants a perfect barcode match. However, I want to allow ONE single wrong nucleotide within the barcode, at any position. Is there an efficient way to filter by barcode while allowing a mismatch? 2. Is there a way to modify nucleotides in ShortRead objects? E.g., to replace a G by an A at position 3 for ALL sequences in the object? Thanks! Daniel

ShortRead ShortRead • 1.6k views

ADD COMMENT • link updated 15.0 years ago by Harris A. Jaffee ▴ 590 • written 15.0 years ago by Daniel.Berner@unibas.ch ▴ 90

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 10 months ago

United States

On 11/05/2010 01:54 PM, Daniel.Berner at unibas.ch wrote: > Hi list > 1. I have a large fastq file containing solexa reads that start with a > barcode (identifier to separate individuals). I now want to filter that > large data set according to the barcodes using ShortRead. I understand > that this is easily done with grep() when one wants a perfect barcode > match. However, I want to allow ONE single wrong nucleotide within the > barcode, at any position. Is there an efficient way to filter by barcode > while allowing a mismatch? > > 2. Is there a way to modify nucleotides in ShortRead objects? E.g., to > replace a G by an A at position 3 for ALL sequences in the object? Hi Daniel -- a strategy is to narrow() the reads to the region of the bar code, and then countPDict(<narrowed seqs="">, DNAString(<barcode>), max.mismatch=1L) != 0, or vcountPDict. I think part 2 is along the lines of idx = as.character(subseq(dna, 3, 3)) == "G" subseq(dna[idx], 3, 3) = "A" though I suspect that character conversion isn't necessary. Martin > > Thanks! > Daniel > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD COMMENT • link 15.0 years ago Martin Morgan 25k

0

Entering edit mode

Harris A. Jaffee ▴ 590

@harris-a-jaffee-3972

Last seen 11.1 years ago

United States

This example illustrates another approach to the first question. You'll need to post-process using the width of the value if you need to delete or select the barcoded reads. > trimLRPatterns(Lpattern="ACGT", subject=c("ACTTAA", "TTTTGG"), max.Lmismatch=1) [1] "AA" "TTTTGG" > trimLRPatterns(Lpattern="ACGT", subject=c("ACTTAA", "TTTTGG"), max.Lmismatch=1, ranges=TRUE) IRanges of length 2 start end width [1] 5 6 2 [2] 1 6 6 You can also use agrep with max.distance=1, but you will need to narrow to the barcode region of each read first (you can't employ "^" as a meta- character). -Harris On Nov 5, 2010, at 4:54 PM, Daniel.Berner at unibas.ch wrote: > Hi list > 1. I have a large fastq file containing solexa reads that start > with a barcode (identifier to separate individuals). I now want to > filter that large data set according to the barcodes using > ShortRead. I understand that this is easily done with grep() when > one wants a perfect barcode match. However, I want to allow ONE > single wrong nucleotide within the barcode, at any position. Is > there an efficient way to filter by barcode while allowing a mismatch? > > 2. Is there a way to modify nucleotides in ShortRead objects? E.g., > to replace a G by an A at position 3 for ALL sequences in the object? > > Thanks! > Daniel > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD COMMENT • link 15.0 years ago Harris A. Jaffee ▴ 590

Login before adding your answer.