a possible bug of trimLRPatterns
1
0
Entering edit mode
wang peter ★ 2.0k
@wang-peter-4647
Last seen 9.6 years ago
reads <- readFastq(fastqfile); seqs <- sread(reads); max.mismatchs <- mismatch_rate*1:nchar(DNAString(PCR2rc)) trimmedCoords <- trimLRPatterns(Rpattern = PCR2rc, subject = seqs, max.Rmismatch= max.mismatchs, with.Rindels=T,ranges=T) > end(trimmedCoords)[1:20] [1] 22 18 20 33 14 22 22 20 22 22 22 15 20 37 19 13 20 22 0 34 > start(trimmedCoords)[1:20] [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 there is a "0" in the end of trimmedCoords so i cannot get the trimmed sequences trimmed3End <- narrow(reads, start=end(trimmedCoords), end=width(reads)) R version 2.14.1 (2011-12-22) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 [4] Rsamtools_1.6.3 lattice_0.20-0 Biostrings_2.22.0 [7] GenomicRanges_1.6.7 IRanges_1.12.6 loaded via a namespace (and not attached): [1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.1 [5] hwriter_1.3 RCurl_1.91-1 rtracklayer_1.14.4 tools_2.14.1 [9] XML_3.9-4 zlibbioc_1.0.0 -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253
• 837 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States
On 03/08/2012 04:40 PM, wang peter wrote: > reads<- readFastq(fastqfile); > seqs<- sread(reads); > max.mismatchs<- mismatch_rate*1:nchar(DNAString(PCR2rc)) > trimmedCoords<- trimLRPatterns(Rpattern = PCR2rc, subject = seqs, > max.Rmismatch= max.mismatchs, with.Rindels=T,ranges=T) > >> end(trimmedCoords)[1:20] > [1] 22 18 20 33 14 22 22 20 22 22 22 15 20 37 19 13 20 22 0 34 >> start(trimmedCoords)[1:20] > [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > > there is a "0" in the end of trimmedCoords so i cannot get the trimmed sequences The sequence has been trimmed entirely > as.character(Views(DNAString("AA"), IRanges(1, 1))) [1] "A" > as.character(Views(DNAString("AA"), IRanges(1, 0))) [1] "" Martin > > trimmed3End<- narrow(reads, start=end(trimmedCoords), end=width(reads)) > > R version 2.14.1 (2011-12-22) > Platform: x86_64-redhat-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 > [4] Rsamtools_1.6.3 lattice_0.20-0 Biostrings_2.22.0 > [7] GenomicRanges_1.6.7 IRanges_1.12.6 > > loaded via a namespace (and not attached): > [1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.1 > [5] hwriter_1.3 RCurl_1.91-1 rtracklayer_1.14.4 tools_2.14.1 > [9] XML_3.9-4 zlibbioc_1.0.0 > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
A reply to a similar post also described removal of zero width reads; "If you want to remove zero-width records before writing, then subset as xx[width(xx) != 0]" So in your case something like; nonZero <- (width(trimmedCoords)!=0) narrow(reads[nonZero], start=start(trimmedCoords)[nonZero], end=end(trimmedCoords)[nonZero]) Marcus On Sat, Mar 10, 2012 at 12:10 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 03/08/2012 04:40 PM, wang peter wrote: > >> reads<- readFastq(fastqfile); >> seqs<- sread(reads); >> max.mismatchs<- mismatch_rate*1:nchar(**DNAString(PCR2rc)) >> trimmedCoords<- trimLRPatterns(Rpattern = PCR2rc, subject = seqs, >> max.Rmismatch= max.mismatchs, with.Rindels=T,ranges=T) >> >> end(trimmedCoords)[1:20] >>> >> [1] 22 18 20 33 14 22 22 20 22 22 22 15 20 37 19 13 20 22 0 34 >> >>> start(trimmedCoords)[1:20] >>> >> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >> >> there is a "0" in the end of trimmedCoords so i cannot get the trimmed >> sequences >> > > The sequence has been trimmed entirely > > > as.character(Views(DNAString("**AA"), IRanges(1, 1))) > [1] "A" > > as.character(Views(DNAString("**AA"), IRanges(1, 0))) > [1] "" > > Martin > > > >> trimmed3End<- narrow(reads, start=end(trimmedCoords), end=width(reads)) >> >> R version 2.14.1 (2011-12-22) >> Platform: x86_64-redhat-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 >> [4] Rsamtools_1.6.3 lattice_0.20-0 Biostrings_2.22.0 >> [7] GenomicRanges_1.6.7 IRanges_1.12.6 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.1 >> [5] hwriter_1.3 RCurl_1.91-1 rtracklayer_1.14.4 tools_2.14.1 >> [9] XML_3.9-4 zlibbioc_1.0.0 >> >> > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
An srFilter could be useful for filtering reads based on width using the subset operation, the advantage is that you can quickly retrieve statistics from the filter to put in a log file. The output is in the style of 'occurrenceFilter' which uses 'ShortRead:::.occurrenceName' to construct input arguments in the stats summary, other filters don't necessarily include input arguments depending on when they were written. widthFilter <- function (w, .name = .widthName(w)) { ShortRead:::.check_type_and_length(w, "numeric", 1L) if(w < 1) .throw(SRError("UserArgumentMismatch", "'w' must be >= 1'")) srFilter(function(x) { width(x)>=w }, name = .name) } .widthName <- function (w) { sprintf("%s w=%s", "widthFilter", w) } ## Example sp <- SolexaPath(system.file('extdata', package='ShortRead')) rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt") ## Create some variable width reads, note: escaping the backslash rfq.ex <- trimEnds(rfq, "\\") ## e.g. Filter reads at least 20 nucleotide cycles wFilter <- widthFilter(20L) rfq.ex[wFilter(rfq.ex)] ## Output for a log file stats(wFilter(rfq.ex)) stats(!wFilter(rfq.ex)) Name Input Passing Op 1 widthFilter w=20 256 251 <na> 2 !(widthFilter w=20) 256 5 ! Marcus On Sat, Mar 10, 2012 at 11:00 AM, Marcus Davy <mdavy86@gmail.com> wrote: > A reply to a similar post also described removal of zero width reads; > > "If you want to remove zero-width records before writing, then subset as > > xx[width(xx) != 0]" > > So in your case something like; > > nonZero <- (width(trimmedCoords)!=0) > narrow(reads[nonZero], start=start(trimmedCoords)[nonZero], > end=end(trimmedCoords)[nonZero]) > > Marcus > > > On Sat, Mar 10, 2012 at 12:10 AM, Martin Morgan <mtmorgan@fhcrc.org>wrote: > >> On 03/08/2012 04:40 PM, wang peter wrote: >> >>> reads<- readFastq(fastqfile); >>> seqs<- sread(reads); >>> max.mismatchs<- mismatch_rate*1:nchar(**DNAString(PCR2rc)) >>> trimmedCoords<- trimLRPatterns(Rpattern = PCR2rc, subject = seqs, >>> max.Rmismatch= max.mismatchs, with.Rindels=T,ranges=T) >>> >>> end(trimmedCoords)[1:20] >>>> >>> [1] 22 18 20 33 14 22 22 20 22 22 22 15 20 37 19 13 20 22 0 34 >>> >>>> start(trimmedCoords)[1:20] >>>> >>> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >>> >>> there is a "0" in the end of trimmedCoords so i cannot get the trimmed >>> sequences >>> >> >> The sequence has been trimmed entirely >> >> > as.character(Views(DNAString("**AA"), IRanges(1, 1))) >> [1] "A" >> > as.character(Views(DNAString("**AA"), IRanges(1, 0))) >> [1] "" >> >> Martin >> >> >> >>> trimmed3End<- narrow(reads, start=end(trimmedCoords), end=width(reads)) >>> >>> R version 2.14.1 (2011-12-22) >>> Platform: x86_64-redhat-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 >>> [4] Rsamtools_1.6.3 lattice_0.20-0 Biostrings_2.22.0 >>> [7] GenomicRanges_1.6.7 IRanges_1.12.6 >>> >>> loaded via a namespace (and not attached): >>> [1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 grid_2.14.1 >>> [5] hwriter_1.3 RCurl_1.91-1 rtracklayer_1.14.4 >>> tools_2.14.1 >>> [9] XML_3.9-4 zlibbioc_1.0.0 >>> >>> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Slight bug in width checking, .throw is in the ShortRead package. if(w < 1) ShortRead:::.throw(SRError("UserArgumentMismatch", "'w' must be >= 1'")) ## w=0 throws an error widthFilter(0L) Error: UserArgumentMismatch 'w' must be >= 1' wFilter <- widthFilter(1L) ## Negate the operator to check for 0 length reads zeroLength <- (!wFilter(rfq.ex)) rfq.ex[zeroLength] class: ShortReadQ length: 4 reads; width: 0 cycles Marcus On Sun, Mar 18, 2012 at 11:21 AM, Marcus Davy <mdavy86@gmail.com> wrote: > An srFilter could be useful for filtering reads based on width using the > subset operation, the advantage is that you can quickly retrieve statistics > from the filter to put in a log file. > > The output is in the style of 'occurrenceFilter' which uses > 'ShortRead:::.occurrenceName' to construct input arguments in the stats > summary, other filters don't necessarily include input arguments depending > on when they were written. > > > > > widthFilter <- > > > function (w, .name = .widthName(w)) > > > { > > > ShortRead:::.check_type_and_length(w, "numeric", 1L) > > > if(w < 1) > > > .throw(SRError("UserArgumentMismatch", "'w' must be >= 1'")) > > > > > > srFilter(function(x) { > > > width(x)>=w > > > }, name = .name) > > > } > > > > > > .widthName <- function (w) > > > { > > > sprintf("%s w=%s", "widthFilter", w) > > > } > > > > ## Example > > > sp <- SolexaPath(system.file('extdata', package='ShortRead')) > > > rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt") > > > > ## Create some variable width reads, note: escaping the backslash > > > rfq.ex <- trimEnds(rfq, "\\") > > ## e.g. Filter reads at least 20 nucleotide cycles > > > wFilter <- widthFilter(20L) > > > rfq.ex[wFilter(rfq.ex)] > > > > ## Output for a log file > > > stats(wFilter(rfq.ex)) > > > stats(!wFilter(rfq.ex)) > > > > Name Input Passing Op > 1 widthFilter w=20 256 251 <na> > 2 !(widthFilter w=20) 256 5 ! > > > Marcus > > > On Sat, Mar 10, 2012 at 11:00 AM, Marcus Davy <mdavy86@gmail.com> wrote: > >> A reply to a similar post also described removal of zero width reads; >> >> "If you want to remove zero-width records before writing, then subset as >> >> xx[width(xx) != 0]" >> >> So in your case something like; >> >> nonZero <- (width(trimmedCoords)!=0) >> narrow(reads[nonZero], start=start(trimmedCoords)[nonZero], >> end=end(trimmedCoords)[nonZero]) >> >> Marcus >> >> >> On Sat, Mar 10, 2012 at 12:10 AM, Martin Morgan <mtmorgan@fhcrc.org>wrote: >> >>> On 03/08/2012 04:40 PM, wang peter wrote: >>> >>>> reads<- readFastq(fastqfile); >>>> seqs<- sread(reads); >>>> max.mismatchs<- mismatch_rate*1:nchar(**DNAString(PCR2rc)) >>>> trimmedCoords<- trimLRPatterns(Rpattern = PCR2rc, subject = seqs, >>>> max.Rmismatch= max.mismatchs, with.Rindels=T,ranges=T) >>>> >>>> end(trimmedCoords)[1:20] >>>>> >>>> [1] 22 18 20 33 14 22 22 20 22 22 22 15 20 37 19 13 20 22 0 34 >>>> >>>>> start(trimmedCoords)[1:20] >>>>> >>>> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >>>> >>>> there is a "0" in the end of trimmedCoords so i cannot get the trimmed >>>> sequences >>>> >>> >>> The sequence has been trimmed entirely >>> >>> > as.character(Views(DNAString("**AA"), IRanges(1, 1))) >>> [1] "A" >>> > as.character(Views(DNAString("**AA"), IRanges(1, 0))) >>> [1] "" >>> >>> Martin >>> >>> >>> >>>> trimmed3End<- narrow(reads, start=end(trimmedCoords), end=width(reads)) >>>> >>>> R version 2.14.1 (2011-12-22) >>>> Platform: x86_64-redhat-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=C LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] ShortRead_1.12.4 latticeExtra_0.6-19 RColorBrewer_1.0-5 >>>> [4] Rsamtools_1.6.3 lattice_0.20-0 Biostrings_2.22.0 >>>> [7] GenomicRanges_1.6.7 IRanges_1.12.6 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] Biobase_2.14.0 bitops_1.0-4.1 BSgenome_1.22.0 >>>> grid_2.14.1 >>>> [5] hwriter_1.3 RCurl_1.91-1 rtracklayer_1.14.4 >>>> tools_2.14.1 >>>> [9] XML_3.9-4 zlibbioc_1.0.0 >>>> >>>> >>> >>> -- >>> Computational Biology >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>> >>> Location: M1-B861 >>> Telephone: 206 667-2793 >>> >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >> >> > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6