a question about trimLRPatterns?
1
0
Entering edit mode
wang peter ★ 2.0k
@wang-peter-4647
Last seen 9.6 years ago
hello all: i want to know how this function process data? for left match it is taken as a "rate" and is converted to max.Lmismatch=as.integer(1:nLp *rate ) then it try to match between the suffix substring(Lpattern, nLp - i + 1, nLp) of Lpattern and the first i letters of subject. dees i start from 1 or nLp? and the corresponding allowed mismatch is max.Lmismatch[i]? for the right match it is taken as a "rate" and is converted to max.Rmismatch=as.integer(1:nRp * rate) then it try to match between the suffix substring(Rpattern, nRp - i + 1, nRp) of subject and the first i letters of Rpattern. dees i start from 1 or nRp? and the corresponding allowed mismatch is max.Rmismatch[i]? -- shan gao Room 231(Dr.Fei lab) Boyce Thompson Institute Cornell University Tower Road, Ithaca, NY 14853-1801 Office phone: 1-607-254-1267(day) Official email:sg839 at cornell.edu Facebook:http://www.facebook.com/profile.php?id=100001986532253
PROcess PROcess • 846 views
ADD COMMENT
0
Entering edit mode
@harris-a-jaffee-3972
Last seen 9.5 years ago
United States
To quote from ?trimLRPatterns, for Lpattern here, Once the integer vector is constructed using the rules given above, when 'with.Lindels' is 'FALSE', 'max.Lmismatch[i]' is the number of acceptable mismatches (errors) between the suffix 'substring(Lpattern, nLp - i + 1, nLp)' of 'Lpattern' and the first 'i' letters of 'subject'. When 'with.Lindels' is 'TRUE', 'max.Lmismatch[i]' represents the allowed "edit distance" between that suffix of 'Lpattern' and 'subject', starting at position '1' of 'subject' (as in 'matchPattern' and 'isMatchingStartingAt'). For a given element 's' of the 'subject', the initial segment (prefix) 'substring(s, 1, j)' of 's' is trimmed if 'j' is the largest 'i' for which there is an acceptable match, if any. If you are asking about implementation, the sub-patterns, i.e suffixes of Lpattern or prefixes of Rpattern, are tested "longest first" using the the relevant max.mismatch vector "from the top, down". (Intuitively, you should think of your max.mismatch vectors as being monotone increasing, perhaps not strictly.) The testing process at the relevant side of the subject stops if/when an acceptable match is seen. The See Also refers to ?`lowlevel-matching`, where you will find which.isMatchingStartingAt() and which.isMatchingEndingAt(). These functions are called with auto.reduce.pattern=TRUE, which allows a single "pattern" and single "at" value to be passed in the context of a *vector* "max.mismatch" value, the actual pattern being tested getting iteratively shorter by 1 character as necessary, for each element of the subject, automatically. Let me know if I didn't get at your question. On Jan 19, 2012, at 3:15 PM, wang peter wrote: > hello all: > > i want to know how this function process data? > > for left match > it is taken as a "rate" and is converted to > max.Lmismatch=as.integer(1:nLp *rate ) > then it try to match between the suffix substring(Lpattern, nLp - i + 1, nLp) > of Lpattern and the first i letters of subject. > dees i start from 1 or nLp? and the corresponding allowed mismatch is > max.Lmismatch[i]? > > for the right match > it is taken as a "rate" and is converted to > max.Rmismatch=as.integer(1:nRp * rate) > then it try to match between the suffix substring(Rpattern, nRp - i + 1, nRp) > of subject and the first i letters of Rpattern. > dees i start from 1 or nRp? and the corresponding allowed mismatch is > max.Rmismatch[i]? > > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
On Jan 19, 2012, at 4:20 PM, Harris A. Jaffee wrote: > To quote from ?trimLRPatterns, for Lpattern here, > > Once the integer vector is constructed using the rules given > above, when 'with.Lindels' is 'FALSE', 'max.Lmismatch[i]' is > the number of acceptable mismatches (errors) between the > suffix 'substring(Lpattern, nLp - i + 1, nLp)' of 'Lpattern' > and the first 'i' letters of 'subject'. When 'with.Lindels' > is 'TRUE', 'max.Lmismatch[i]' represents the allowed "edit > distance" between that suffix of 'Lpattern' and 'subject', > starting at position '1' of 'subject' (as in 'matchPattern' > and 'isMatchingStartingAt'). > > For a given element 's' of the 'subject', the initial segment > (prefix) 'substring(s, 1, j)' of 's' is trimmed if 'j' is the > largest 'i' for which there is an acceptable match, if any. > > If you are asking about implementation, the sub-patterns, i.e suffixes of > Lpattern or prefixes of Rpattern, are tested "longest first" using the > the relevant max.mismatch vector "from the top, down". (Intuitively, you > should think of your max.mismatch vectors as being monotone increasing, > perhaps not strictly.) The testing process at the relevant side of the > subject stops if/when an acceptable match is seen. The See Also refers to > ?`lowlevel-matching`, where you will find which.isMatchingStartingAt() and > which.isMatchingEndingAt(). These functions are called with > auto.reduce.pattern=TRUE, which allows a single "pattern" and single "at" > value to be passed in the context of a *vector* "max.mismatch" value, the > actual pattern being tested getting iteratively shorter by 1 character as > necessary, for each element of the subject, automatically. To clarify, in the C code, there are two loops. There is an outside loop over the subject, and then for each subject element, the specified single pattern is iteratively "auto-reduced" as necessary. > Let me know if I didn't get at your question. > > On Jan 19, 2012, at 3:15 PM, wang peter wrote: > >> hello all: >> >> i want to know how this function process data? >> >> for left match >> it is taken as a "rate" and is converted to >> max.Lmismatch=as.integer(1:nLp *rate ) >> then it try to match between the suffix substring(Lpattern, nLp - i + 1, nLp) >> of Lpattern and the first i letters of subject. >> dees i start from 1 or nLp? and the corresponding allowed mismatch is >> max.Lmismatch[i]? >> >> for the right match >> it is taken as a "rate" and is converted to >> max.Rmismatch=as.integer(1:nRp * rate) >> then it try to match between the suffix substring(Rpattern, nRp - i + 1, nRp) >> of subject and the first i letters of Rpattern. >> dees i start from 1 or nRp? and the corresponding allowed mismatch is >> max.Rmismatch[i]? >> >> -- >> shan gao >> Room 231(Dr.Fei lab) >> Boyce Thompson Institute >> Cornell University >> Tower Road, Ithaca, NY 14853-1801 >> Office phone: 1-607-254-1267(day) >> Official email:sg839 at cornell.edu >> Facebook:http://www.facebook.com/profile.php?id=100001986532253 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 949 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6