vmatchPattern
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.2 years ago
When will vmatchPattern support indels? -- output of sessionInfo(): > reads[1:5] A DNAStringSet instance of length 5 width seq [1] 51 NTATTGGTAGCGAATTCCTGTACTGCTTCGACACACTGTACTCTTCAACTT [2] 51 NCTTGGGATGCGAATTCCTGTACTGCTTCGCGGTGTCGTACTCTTCAACTT [3] 51 NGAGGGTGTTCGAATTCCTGTACTGCTTCGAAATCTGGTACTCTTCAACTT [4] 51 NTAGCGTTGACGAATTCCTGTACTGCTTCGCTCCACCGTACTCTTCAACTT [5] 51 NCTGTTTGCCCGAATTCCTGTACTGCTTCGTACCGTTGTACTCTTCAACTT > constant <- vmatchPattern("CGAATTCCTGTACTGCTTCG", reads[1:5], max.mismatch=1, with.indels=TRUE) Error in .XStringSet.vmatchPattern(pattern, subject, max.mismatch, min.mismatch, : vmatchPattern() does not support indels yet -- Sent via the guest posting facility at bioconductor.org.
• 3.2k views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 3 days ago
Seattle, WA, United States
Hi Richard, On 03/17/2014 12:53 PM, Richard Dannebaum [guest] wrote: > When will vmatchPattern support indels? > > > > -- output of sessionInfo(): > >> reads[1:5] > A DNAStringSet instance of length 5 > width seq > [1] 51 NTATTGGTAGCGAATTCCTGTACTGCTTCGACACACTGTACTCTTCAACTT > [2] 51 NCTTGGGATGCGAATTCCTGTACTGCTTCGCGGTGTCGTACTCTTCAACTT > [3] 51 NGAGGGTGTTCGAATTCCTGTACTGCTTCGAAATCTGGTACTCTTCAACTT > [4] 51 NTAGCGTTGACGAATTCCTGTACTGCTTCGCTCCACCGTACTCTTCAACTT > [5] 51 NCTGTTTGCCCGAATTCCTGTACTGCTTCGTACCGTTGTACTCTTCAACTT > >> constant <- vmatchPattern("CGAATTCCTGTACTGCTTCG", reads[1:5], max.mismatch=1, with.indels=TRUE) > > Error in .XStringSet.vmatchPattern(pattern, subject, max.mismatch, min.mismatch, : > vmatchPattern() does not support indels yet Yes that's something I've had on my list for a while. Recently I was also asked off-list when matchPDict() and family would support indels. But that's a much harder problem. For vmatchPattern(), the only show stopper is the unability to store matches of variable length in an MIndex object, because of the current design of these objects. But the core algo implemented at the C level already supports indels. Here is a version of vmatchPattern() that returns an IRangesList object instead of an MIndex. It supports indels: vmatchPattern2 <- function(pattern, subject, max.mismatch=0, min.mismatch=0, with.indels=FALSE, fixed=TRUE, algorithm="auto") { if (!is(subject, "XStringSet")) subject <- Biostrings:::XStringSet(NULL, subject) algo <- Biostrings:::normargAlgorithm(algorithm) if (Biostrings:::isCharacterAlgo(algo)) stop("'subject' must be a single (non-empty) string ", "for this algorithm") pattern <- Biostrings:::normargPattern(pattern, subject) max.mismatch <- Biostrings:::normargMaxMismatch(max.mismatch) min.mismatch <- Biostrings:::normargMinMismatch(min.mismatch, max.mismatch) with.indels <- Biostrings:::normargWithIndels(with.indels) fixed <- Biostrings:::normargFixed(fixed, subject) algo <- Biostrings:::selectAlgo(algo, pattern, max.mismatch, min.mismatch, with.indels, fixed) C_ans <- .Call2("XStringSet_vmatch_pattern", pattern, subject, max.mismatch, min.mismatch, with.indels, fixed, algo, "MATCHES_AS_RANGES", PACKAGE="Biostrings") unlisted_ans <- IRanges(start=unlist(C_ans[[1L]], use.names=FALSE), width=unlist(C_ans[[2L]], use.names=FALSE)) relist(unlisted_ans, C_ans[[1L]]) } You'll need Biostrings 2.31.15 for this code to work, which should become available thru biocLite() for BioC devel users in the next 24 hours or so. A clean fix in Biostrings would be to do this (i.e. change the type returned by vmatchPattern()) or to modify the MIndex internals to support matches of variable length. Probably the former. Changing the type returned by a function requires some precautions though because it tends to break existing code. Hopefully I manage to find some time to work on these things after the BioC 2.14 release. Cheers, H. > > > > -- > Sent via the guest posting facility at bioconductor.org. > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode

Have there been any updates yet? I still get "not supported yet" when I try it.

ADD REPLY
0
Entering edit mode

No update on this yet. Never found the time to work on it. Do you want to open an issue on GitHub to request this feature? That way at least it stays on the radar. (Make sure to include a link to this thread in your post on GitHub.) Also at some point someone else will have to take care of the Biostrings package so we should make sure that all requests and issues are recorded in the Biostrings repo on GitHub. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6