still the trimLRPatterns problem
1
0
Entering edit mode
wang peter ★ 2.0k
@wang-peter-4647
Last seen 9.6 years ago
i want to remove the PCR2rc from the subject, but it is can not recognized if i set the mismatch =0.2 how can i sent parameter to let trimLRPatterns works GATCGGAAGAGCACACGTCTGAACTCCA TCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAACGACACAAGCCC AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG subject<- DNAString(" GATCGGAAGAGCACACGTCTGAACTCCATCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAA AAAAAAACGACACAAGCCC") PCR2rc <- DNAString("AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCT GCTTG") max.mismatchs <- 0.25*1:nchar(PCR2rc) trimLRPatterns( Lpattern = PCR2rc, subject = subject, max.Lmismatch=0.2,with.Rindels=T) 88-letter "DNAString" instance seq: ATCGGAAGAGCACACGTCTGAACTCCATCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAAA AAAAAACGACACAAGCCC trimLRPatterns( Lpattern = PCR2rc, subject = subject, max.Lmismatch=0.5,with.Rindels=T) 27-letter "DNAString" instance seq: AAAAAAAAAAAAAAACGACACAAGCCC countPattern(PCR2rc, subject, max.mismatch= 0.2, min.mismatch=0, with.indels=TRUE) [1] 0 [[alternative HTML version deleted]]
• 690 views
ADD COMMENT
0
Entering edit mode
@harris-a-jaffee-3972
Last seen 9.5 years ago
United States
On Oct 11, 2011, at 10:59 AM, wang peter wrote: > i want to remove the PCR2rc from the subject, but it is can not > recognized > if i set the mismatch =0.2 > how can i sent parameter to let trimLRPatterns works > GATCGGAAGAGCACACGTCTGAACTCCA > TCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAACGACACAAGCCC > AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG It would really help if you could ask something specific about these 3 strings. > subject<- DNAString(" > GATCGGAAGAGCACACGTCTGAACTCCATCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAA > AAAAAAACGACACAAGCCC") > PCR2rc <- > DNAString > ("AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG") Ok, so these are your major players. See below. > max.mismatchs <- 0.25*1:nchar(PCR2rc) I'm ignoring this since it's irrelevant. Now, to get everyone on the same page, let me state some pertinent facts: > PCR2rc.2 <- substr(PCR2rc, 2, nchar(PCR2rc)) > subject 89-letter "DNAString" instance seq: GATCGGAAGAGCACACGTCTGAACTCCATCACATCA...TTCTGCTTGAAAAAAAAAAAAAAACGACACA AG CCC > PCR2rc.2 63-letter "DNAString" instance seq: GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG > neditAt(PCR2rc.2, subject) [1] 32 > neditAt(PCR2rc.2, subject, with.indels=TRUE) [1] 2 So your question is basically how to get a large prefix of the subject trimmed, when it is more like my somewhat artificial 'PCR2rc.2' than your real 'PCR2rc'. > trimLRPatterns( Lpattern = PCR2rc, subject = subject, > max.Lmismatch=0.2,with.Rindels=T) > 88-letter "DNAString" instance > seq: > ATCGGAAGAGCACACGTCTGAACTCCATCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAAA > AAAAAACGACACAAGCCC > > trimLRPatterns( Lpattern = PCR2rc, subject = subject, > max.Lmismatch=0.5,with.Rindels=T) > 27-letter "DNAString" instance > seq: AAAAAAAAAAAAAAACGACACAAGCCC These calls do not make complete sense. You want 'with.*L*indels=TRUE'. More about that later. But doing so, you don't need an absurd max.Lmismatch setting; 0.2 is quite enough: > trimLRPatterns(Lpattern = PCR2rc, subject = subject, max.Lmismatch=0.2, with.Lindels=TRUE) 25-letter "DNAString" instance seq: AAAAAAAAAAAAACGACACAAGCCC > countPattern(PCR2rc, subject, max.mismatch= 0.2, min.mismatch=0, > with.indels=TRUE) > [1] 0 As I've said before, the matchPattern/countPattern family is insensitive to non-integral mismatch values. They are silently truncated, via as.integer(). In this case, your 0.2 becomes 0. Again, the pertinent facts are: > neditAt(PCR2rc, subject, with.indels=TRUE) [1] 3 > countPattern(PCR2rc, subject, max.mismatch=3, with.indels=TRUE) [1] 1 Ok, so now we can consider these somewhat philosophical questions: 1) Should trimLRPatterns save you from setting irrelevant parameters (with.Rindels, when you're "trimming on the left")? 2) Should matchPattern/countPattern save you from a funny mismatch setting? I don't know about 1) per se, but in view of "trimLRPatterns 2.0", the indels parameters will be on/TRUE by default (and possibly not exist at all). For 2), I think you should get a warning, at least, if not a hard error. > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6