I am using the TFBSTools package to predict transcription factor binding sites in a set of sequences. I am trying to use the pvalues function on SiteSet objects so I can filter out insignificant hits. While this works for some SiteSet objects, on others the function just hangs. I can't see any pattern in which objects that cause it to hang and which don't. Here is a script to reproduce it (the function hangs on item 57 in the list):
library(TFBSTools)
library(JASPAR2014)
library(Biostrings)
# Get all PWMs from JASPAR database
opts = list()
opts[["species"]] <- 9606 # human
opts[["all_versions"]] <- TRUE
opts[["matrixtype"]] <- "PWM"
pwmatrices = getMatrixSet(JASPAR2014, opts)
seq <- "ATGGCATACAGTCGGTAAAGCGTAGTGCTTGAGAGCATGGATTATGGAGACTATTTAAATACTGGCTCCATAACTTAATAGCTTGGGACCCACGGTGTTACTTAGCTTCTATCTGCTTTAATTTACTCATCTGTAAACTTGGGATAAGATACTTCCTCATAAGGTTGGTGTGAAGACCAAGTGAATTAACTATCGTTTAAAGCACTTACAAAAGTGCCGGGCACCACCGAGATATGCATCCGTTAGCTTTTATTATTATTAGACTCAAAACACTGTAGTAGTTCTAATGAGAGGGGTAAGAATCAAAAATCCAGGCACCTGCATAGAGCCAGAGAGGCACACATAGAAGCAACGTAAGAGTGGAAGCGGAATGAAAACATGCTAAAGCCAGGTACAAGCCACAAGCGAGGGTCCACAGGAAGAAATTGTTAATTCTGAAGAGAGTGAATGCACGAAGTTACAGGAAAAATAACATCTGAACAGAGTTTAAGAATGAGCAGGACTTCAACAAGTGGCTAGTAAGACATAAGGAACCTACAAAAGATCTTAGCAAAGGCGCAAAGATTACCATCGTATTGCTCGTTTCTTCCTACTTTGCAGAAGTAACCTCTGGCGAACAGAGGTGGTTGCAGAGCATGCTTATCAAGCAAAATACCACGAAGCAGTAAGGAACGACAGAGATAACAGTAACAATAATAATTCACCCCAAGGTACTCAACTGGAAAAAGGAAATACAGAGGAGAGGTGTCGTTAAGAAAGCCAGGACGCACATCACGGCCCCGTCGCTGCACTACTCTCGTCTAGGGGTCAACAGTGGAGTCGAGACTCGAAGCTTCCACGCGGCGGAACAGCGTCCCTCTCAGGCGGCGAACGGGCTAGGGAAGCGCCCGGAGGAGACCTAGCGTGAGAACTACAACTCCCGCGGAGCCCGAGGGCGAGCTGCCTGCGTAACTTCCGCTTCCGCCACCTGCCCCTCTCACCCTCTTCACTCGAACCCTAC"
sequencename <- "M6PR"
sitesetList <- searchSeq(pwmatrices, seq, seqname=sequencename, min.score="80%", strand="*")
pvalues(sitesetList[[1]])
pvalues(sitesetList[[2]])
pvalues(sitesetList[[57]])
Here is the output of sessionInfo():
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] TFBSTools_1.4.0
loaded via a namespace (and not attached):
[1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.9
[4] BiocGenerics_0.12.1 BiocParallel_1.0.3 Biostrings_2.34.1
[7] bitops_1.0-6 brew_1.0-6 BSgenome_1.34.1
[10] caTools_1.17.1 checkmate_1.5.1 CNEr_1.2.0
[13] codetools_0.2-11 DBI_0.3.1 digest_0.6.8
[16] DirichletMultinomial_1.8.0 fail_1.2 foreach_1.4.2
[19] GenomeInfoDb_1.2.4 GenomicAlignments_1.2.2 GenomicRanges_1.18.4
[22] grid_3.1.2 gtools_3.4.1 IRanges_2.0.1
[25] iterators_1.0.7 parallel_3.1.2 Rcpp_0.11.5
[28] RCurl_1.95-4.5 Rsamtools_1.18.3 RSQLite_1.0.0
[31] rtracklayer_1.26.2 S4Vectors_0.4.0 sendmailR_1.2-1
[34] seqLogo_1.32.1 stats4_3.1.2 stringr_0.6.2
[37] TFMPvalue_0.0.5 tools_3.1.2 XML_3.98-1.1
[40] XVector_0.6.0 zlibbioc_1.12.0