Problem with BioStrings: patternMatch ALSO BLAST
1
0
Entering edit mode
Paul Leo ▴ 970
@paul-leo-2092
Last seen 10.2 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070607/ f0f495b3/attachment.pl
• 267 views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Paul, Quoting Paul Leo <p.leo at="" uq.edu.au="">: > These commands are straight from the Biostrings; PatternMatch manual. > Could not find any help online. > Any advice on the key 86 error?? See below for the explanation. > > > Otherwise has anyone do genomic BLAST from within R. Any advice on the > best path.... Use Blastcl3 ? ,another standalone blast , ? > Maybe a 1000 or do short sequences to blast.... There is no BLAST in Biostrings but there is the needwunsQS() function for _global_ alignment of 2 sequences. I just rewrote this function in C in Biostrings devel (2.5.9) to make it faster. Now it's possible to align two 20000-letter sequences (nucleotides or amino acid) in 2 or 3 seconds, granted that you have enough memory for this. Look at the Alignment.pdf vignette, the man page for needwunsQS() (?needwunsQS) and at ?scoring_matrices for more information on this. [continued below] > > Thanks in advance > Paul > > CODE > > > cI <- Mmusculus$chr13 > > length(cI) > [1] 120614378 > > class(cI) > [1] "DNAString" > attr(,"package") > [1] "Biostrings" > > > > > p <- "UUACAGUUGUUCAACCAGUUACU" > > *********** ERROR**************** > > matchPattern(p, cI, mismatch = 0) > Error in CharBuffer.write(data, 1, length, value = src, enc = lkup) : > key 85 not in lookup table > > ************************************* You are applying matchPattern() on a DNAString subject (cI) so the pattern (p) is expected to be a DNAString object too or a string that can be converted into a DNAString object. In your example p contains the letter U (looks like an RNA sequence) so it can't be converted to a DNAString object. key 85 is the ascii code of the U letter: this code does not belong to the lookup table used internally to encode the letters of a DNAString object (letters stored in a DNAString or RNAString objects are encoded, this is a trick to allow some fast searching algorithms). I agree that the current error message is not really helpful, sorry! I plan to work on this and to improve the overall documentation too. Your pattern (p) can be converted into a RNAString object first with rnap <- RNAString(p) and then into a DNAString object with: dnap <- DNAString(rnap) This will not necessarily give you what you want since when converting from RNAString to DNAString (or inversely) the current translation is applied: U <-> A G <-> C C <-> G A <-> T (which mimics the transcription/reverse transcription processus) so you might want to apply complement() (or reverseComplement()) to the result. Cheers, H. > > > sessionInfo() > R version 2.5.0 (2007-04-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC _MON > ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia. 1252 > > attached base packages: > [1] "tools" "stats" "graphics" "grDevices" "utils" > "datasets" > [7] "methods" "base" > > other attached packages: > BSgenome.Mmusculus.UCSC.mm8 BSgenome > "1.2.0" "1.4.0" > Biobase Biostrings > "1.14.0" "2.4.5" > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6