Problem with BioStrings: patternMatch ALSO BLAST

0

Entering edit mode

Paul Leo ▴ 970

@paul-leo-2092

Last seen 10.2 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070607/ f0f495b3/attachment.pl

• 267 views

ADD COMMENT • link updated 17.5 years ago by Hervé Pagès 16k • written 17.5 years ago by Paul Leo ▴ 970

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 1 day ago

Seattle, WA, United States

Hi Paul, Quoting Paul Leo <p.leo at="" uq.edu.au="">: > These commands are straight from the Biostrings; PatternMatch manual. > Could not find any help online. > Any advice on the key 86 error?? See below for the explanation. > > > Otherwise has anyone do genomic BLAST from within R. Any advice on the > best path.... Use Blastcl3 ? ,another standalone blast , ? > Maybe a 1000 or do short sequences to blast.... There is no BLAST in Biostrings but there is the needwunsQS() function for _global_ alignment of 2 sequences. I just rewrote this function in C in Biostrings devel (2.5.9) to make it faster. Now it's possible to align two 20000-letter sequences (nucleotides or amino acid) in 2 or 3 seconds, granted that you have enough memory for this. Look at the Alignment.pdf vignette, the man page for needwunsQS() (?needwunsQS) and at ?scoring_matrices for more information on this. [continued below] > > Thanks in advance > Paul > > CODE > > > cI <- Mmusculus$chr13 > > length(cI) > [1] 120614378 > > class(cI) > [1] "DNAString" > attr(,"package") > [1] "Biostrings" > > > > > p <- "UUACAGUUGUUCAACCAGUUACU" > > *********** ERROR**************** > > matchPattern(p, cI, mismatch = 0) > Error in CharBuffer.write(data, 1, length, value = src, enc = lkup) : > key 85 not in lookup table > > ************************************* You are applying matchPattern() on a DNAString subject (cI) so the pattern (p) is expected to be a DNAString object too or a string that can be converted into a DNAString object. In your example p contains the letter U (looks like an RNA sequence) so it can't be converted to a DNAString object. key 85 is the ascii code of the U letter: this code does not belong to the lookup table used internally to encode the letters of a DNAString object (letters stored in a DNAString or RNAString objects are encoded, this is a trick to allow some fast searching algorithms). I agree that the current error message is not really helpful, sorry! I plan to work on this and to improve the overall documentation too. Your pattern (p) can be converted into a RNAString object first with rnap <- RNAString(p) and then into a DNAString object with: dnap <- DNAString(rnap) This will not necessarily give you what you want since when converting from RNAString to DNAString (or inversely) the current translation is applied: U <-> A G <-> C C <-> G A <-> T (which mimics the transcription/reverse transcription processus) so you might want to apply complement() (or reverseComplement()) to the result. Cheers, H. > > > sessionInfo() > R version 2.5.0 (2007-04-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC _MON > ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia. 1252 > > attached base packages: > [1] "tools" "stats" "graphics" "grDevices" "utils" > "datasets" > [7] "methods" "base" > > other attached packages: > BSgenome.Mmusculus.UCSC.mm8 BSgenome > "1.2.0" "1.4.0" > Biobase Biostrings > "1.14.0" "2.4.5" > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 17.5 years ago Hervé Pagès 16k

Login before adding your answer.