is matchPattern faster?
1
0
Entering edit mode
wang peter ★ 2.0k
@wang-peter-4647
Last seen 10.2 years ago
i have some reads from illumina sequencer there are some "N" in the reads i want to find the last "N" in the first 6 bp of the sequence so do you think matchPattern is faster ? or other low-level match methods? reads <- readFastq(fastqfile) seqs <- sread(reads); matchPattern("N", seqs) -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253
• 1.1k views
ADD COMMENT
0
Entering edit mode
@harris-a-jaffee-3972
Last seen 10.1 years ago
United States
Perhaps something along the lines r = reverse(narrow(seqs, 1, 6)) answer = 7 - which.isMatchingAt("N", r, at=1:6) On Jan 19, 2012, at 5:26 PM, wang peter wrote: > i have some reads from illumina sequencer > there are some "N" in the reads > i want to find the last "N" in the first 6 bp of the sequence > so do you think matchPattern is faster ? or other low-level match methods? > > reads <- readFastq(fastqfile) > seqs <- sread(reads); > matchPattern("N", seqs) > > > > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Or maybe n = narrow(seqs, 1, 6) answer = 6 - nchar(sub(".*N", "", n)) On Jan 19, 2012, at 6:01 PM, Harris A. Jaffee wrote: > Perhaps something along the lines > > r = reverse(narrow(seqs, 1, 6)) > answer = 7 - which.isMatchingAt("N", r, at=1:6) > > On Jan 19, 2012, at 5:26 PM, wang peter wrote: >> i have some reads from illumina sequencer >> there are some "N" in the reads >> i want to find the last "N" in the first 6 bp of the sequence >> so do you think matchPattern is faster ? or other low-level match methods? >> >> reads <- readFastq(fastqfile) >> seqs <- sread(reads); >> matchPattern("N", seqs) >> >> >> >> -- >> shan gao >> Room 231(Dr.Fei lab) >> Boyce Thompson Institute >> Cornell University >> Tower Road, Ithaca, NY 14853-1801 >> Office phone: 1-607-254-1267(day) >> Official email:sg839 at cornell.edu >> Facebook:http://www.facebook.com/profile.php?id=100001986532253 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Sorry, I'll get it eventually. answer = which.isMatchingAt("N", seqs, at=6:1, follow.index=TRUE) Let me know if anything beats it. On Jan 19, 2012, at 6:15 PM, Harris A. Jaffee wrote: > Or maybe > > n = narrow(seqs, 1, 6) > answer = 6 - nchar(sub(".*N", "", n)) > > On Jan 19, 2012, at 6:01 PM, Harris A. Jaffee wrote: >> Perhaps something along the lines >> >> r = reverse(narrow(seqs, 1, 6)) >> answer = 7 - which.isMatchingAt("N", r, at=1:6) >> >> On Jan 19, 2012, at 5:26 PM, wang peter wrote: >>> i have some reads from illumina sequencer >>> there are some "N" in the reads >>> i want to find the last "N" in the first 6 bp of the sequence >>> so do you think matchPattern is faster ? or other low-level match >>> methods? >>> >>> reads <- readFastq(fastqfile) >>> seqs <- sread(reads); >>> matchPattern("N", seqs) >>> >>> >>> >>> -- >>> shan gao >>> Room 231(Dr.Fei lab) >>> Boyce Thompson Institute >>> Cornell University >>> Tower Road, Ithaca, NY 14853-1801 >>> Office phone: 1-607-254-1267(day) >>> Official email:sg839 at cornell.edu >>> Facebook:http://www.facebook.com/profile.php?id=100001986532253 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/ >>> gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/ >> gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 783 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6