a question about the low level match function
1
0
Entering edit mode
wang peter ★ 2.0k
@wang-peter-4647
Last seen 9.6 years ago
dear ALL, harry and steve? i am so sorry to disturb you again.but this time,i read the mannu and some source coding carefully. but still confused with the process how trimLRPatterns works? i trace back to the function Biostrings:::.computeTrimEnd showMethods(which.isMatchingEndingAt, includeDefs=TRUE) Biostrings:::.matchPatternAt if (is(subject, "XString")) .Call2("XString_match_pattern_at", pattern, subject, at, at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type, auto.reduce.pattern, PACKAGE = "Biostrings") else .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at, at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type, auto.reduce.pattern, PACKAGE = "Biostrings") i think it will call the low level coding. for example: trimLRPatterns(Rpattern = Rpattern, subject = subject, max.Rmismatch=0.1, with.Lindels=TRUE) subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" then the function will change max.Rmismatch to max.Rmismatch= as.integer(max.Rmismatch*1:nchar(Rpattern)) [1] 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 as i know the process is,it try to get the distance between p and s p = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" allowing 3 mismatch s = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" p = "AATAGTACTGTAGGCACCATCAATAGATCGGAA" allowing 3 mismatch s = "GAATAGTACTGTAGGCACCATCAATAGATCGGA" ... p = "A" allowing 0 mismatch s = "G" but what does the parameter at mean? -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253
PROcess PROcess • 698 views
ADD COMMENT
0
Entering edit mode
@harris-a-jaffee-3972
Last seen 9.4 years ago
United States
On Nov 6, 2012, at 3:02 PM, wang peter wrote: > dear ALL, harry and steve? > i am so sorry to disturb you again.but this time,i read the mannu > and some source coding carefully. but still confused with the process > how trimLRPatterns works? > i trace back to the function > > Biostrings:::.computeTrimEnd The relevant statement is ii <- which.isMatchingEndingAt(pattern = Rpattern, subject = subject, ending.at = subject_width, max.mismatch = max.Rmismatch, with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE) 'subject_width' is constant at this time, because of this earlier test: if (!isConstant(width(subject))) { tmp <- .computeTrimStart(reverse(Rpattern), reverse(subject), max.Rmismatch, with.Rindels, Rfixed) return(width(subject) - tmp + 1L) } auto.reduce.pattern=TRUE tells the *EndingAt function to test a vector of patterns against each subject element subject to the 'max.mismatch' vector of edit distance limits. These patterns are constructed behind the scenes (in C) from your single 'pattern=Rpattern'. For example, if your Rpattern was "TCGGAA", the test patterns would be, in order, "TCGGAA" "TCGGA" "TCGG" "TCG" "TC" "T" They are tested using 'ending.at=subject_width', as I've hinted by the way I've written them. The "which" in the function name is associated with its underlying code (in this case, C code) stopping at the first hit, subject to your edit limits. For example, if a subject element happens to end with "TCGGA" within your limits, the test loop for that subject element stops. > showMethods(which.isMatchingEndingAt, includeDefs=TRUE) > Biostrings:::.matchPatternAt > > if (is(subject, "XString")) > .Call2("XString_match_pattern_at", pattern, subject, > at, at.type, max.mismatch, min.mismatch, with.indels, > fixed, ans.type, auto.reduce.pattern, PACKAGE = "Biostrings") > else .Call2("XStringSet_vmatch_pattern_at", pattern, subject, > at, at.type, max.mismatch, min.mismatch, with.indels, > fixed, ans.type, auto.reduce.pattern, PACKAGE = "Biostrings") > > i think it will call the low level coding. Yes, these are calls to C. 'at.type' is set to 1L by all the *EndingAt functions (and to 0L by all the *StartingAt functions). The statement above in .computeTrimEnd supplies 'ending.at', namely the subject width, which is sent as the 'at' argument of .matchPatternAt and forwarded to C. > for example: > trimLRPatterns(Rpattern = Rpattern, subject = subject, > max.Rmismatch=0.1, with.Lindels=TRUE) > > subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" > Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" > > then the function will change max.Rmismatch to > max.Rmismatch= as.integer(max.Rmismatch*1:nchar(Rpattern)) > [1] 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 > > as i know the process is,it try to get the distance between p and s > > p = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" allowing 3 mismatch > s = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" > > p = "AATAGTACTGTAGGCACCATCAATAGATCGGAA" allowing 3 mismatch > s = "GAATAGTACTGTAGGCACCATCAATAGATCGGA" > ... > p = "A" allowing 0 mismatch > s = "G" > > but what does the parameter at mean? See 'at' and 'ending.at' above. Does this help? > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6