I am trying to re-implement a script that I found on github (https://github.com/liz-is/ctcf-motif-imr90/blob/master/imr90_ctcf_motif_direction.R). This script finds the directionality of all CTCF sites in IMR90 cells.
I would like to use this analysis for mm9 ESC cells. However I have run into a number of problems due to the sparsity of the databases queried and I am trying to find ways around this. There is no PFM/PWM for CTCF in the mm9 genome on the database being queried and hence I am trying to find a source where I can manually import these data to run the script.
library("AnnotationHub")
library("BSgenome.Mmusculus.UCSC.mm9")
library("JASPAR2016")
library("TFBSTools")
opts <- list(name="CTCF", species = "10090") #species = mouse
pfm <- getMatrixSet(JASPAR2016, opts)[[1]]
pwm <- TFBSTools:::toPWM(pfm)
Unfrotunately the 'getMatrixSet' command didn't return results.
So instead I downloaded the pwm directly from: http://hocomoco11.autosome.ru/search?arity=mono&query=CTCF&species=mouse. (This proved extremely difficult to find.... you would think the pfm/pwm for such a common TF in such a common cell-line would be readily available).
m=read.table('CTCFL_MOUSE.H11MO.0.A.pwm', sep='\t', skip=1)
colnames(m=c('A','G','C','T'))
pwm=t(m)
> pwm
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
A -0.2503432 -1.479023 -2.574059 0.3141615 -0.9277147 -1.51467606 1.273071 -3.563451 -0.3816056
G -0.2101574 1.176990 1.344214 -0.8315814 0.7048456 0.85812193 -2.153051 -3.563451 -4.3917302
C 0.6396630 -1.056784 -3.116717 0.4603267 0.4244349 -1.10379501 -1.551647 1.368878 1.1867805
T -0.6632622 -1.714934 -3.116717 -0.4927609 -2.9510810 0.08597924 -2.300909 -4.391730 -3.5634514
[,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
A -1.6715458 -3.116717 -3.894180 -1.629962 -0.6031399 -2.3009085 -0.7955746 -0.06384649 -1.05678409
G -1.5146761 -3.563451 -2.474486 1.323185 -3.8941799 1.1393962 0.4399751 0.22266277 0.37631039
C 1.1393962 1.360724 1.316789 -4.391730 1.2250123 -0.5059102 -0.9481023 0.46032674 0.04104507
T -0.7608194 -3.563451 -1.807804 -3.315342 -3.5634514 -1.7602913 0.4753232 -1.47902311 0.14291340
ctcf_imr90_seqs <- getSeq(Mmusculus, ctcf_imr90)
imr90_matches <- lapply(ctcf_imr90_seqs, function(s){
searchSeq(pwm, s, min.score = "75%")}) #### ERROR
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘searchSeq’ for signature ‘"matrix"’
So it seems that the TFBSTools- searchSeq() command requires a particular class of input. When investigating the class of a normal pwm downloaded from JASPAR via 'JASPAR2014':
> opts <- list(name="CTCF", species = "9606") #species = human
> pfm <- getMatrixSet(JASPAR2016, opts)[[1]]
> pwm <- TFBSTools:::toPWM(pfm)
> class(pwm)
[1] "PWMatrix"
attr(,"package")
[1] "TFBSTools"
So my question is how can I go about importing my manually downloaded pwm into R such that the 'TFBSTools:::searchSeq()' function will accept it and run the intended analysis?