0
0
Entering edit mode
@harris-a-jaffee-3972
Last seen 8.3 years ago
United States
Just getting to my mail after the power outage in New Jersey. I'm laying claim to the sapply quoted here, verbatim as far as I can tell, sent off-list (my bad) about a year ago in order to offer an exploratory approach to the setting of max.Rmismatch. The conclusion would be, for this subject sequence and for the first Rpattern here, that 0 is a good value, and in the second case, as Herv? has said, that 2 is good when 1 was not enough. trimLRPatterns does not actually use any nedit function nor an sapply, although it does "stop" (at the C level) at the first position satisfying max.Rmismatch, if any, which of course can vary over the subject space. On Oct 30, 2012, at 12:58 PM, wang peter wrote: > i want to know how this function works? > > for example: > trimLRPatterns(Rpattern = Rpattern, subject = subject, > max.Rmismatch=1,with.Lindels=TRUE) > > > subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" > Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" > > the function will try to calculate the distance by such coding: > > sapply((nchar(subject)-nchar(Rpattern)+1):nchar(subject), function(j) { > s = substr(subject, j, nchar(subject)) > p = substr(Rpattern, 1, nchar(subject)-j+1) > neditEndingAtending.at=nchar(s), pattern = p, subject = s, > with.indels=TRUE) > }) > [1] 0 2 4 6 8 10 12 14 15 14 13 12 11 10 9 9 8 7 8 7 6 5 > 6 6 5 4 4 4 3 2 1 0 > [33] 1 1 > when the function find the value which is first satisfy the > max.Rmismatch value, it will stop > in this case,they function will stop at the first position. > > IF > subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" > Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGTT" > The results > [1] 2 3 4 6 8 10 12 14 15 14 13 12 11 10 9 9 8 7 8 7 6 5 > 6 6 5 4 4 4 3 2 1 0 > [33] 1 1 > it will stop > in this case,they function will stop at > subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA" > Rpattern = > "GAATAGTACTGTAGGCACCATCAATAGATCGGTT" > > > so the shortcoming is the trimLRPatterns cannot find the shared > sequence between subject and Rpattern > "GAATAGTACTGTAGGCACCATCAATAGATCGG" > > -- > shan gao > Room 231(Dr.Fei lab) > Boyce Thompson Institute > Cornell University > Tower Road, Ithaca, NY 14853-1801 > Office phone: 1-607-254-1267(day) > Official email:sg839 at cornell.edu > Facebook:http://www.facebook.com/profile.php?id=100001986532253 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
• 497 views