Querying/manipulating JASPAR data
1
0
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States
Howdy, I was curious if there are any packages or other means (some web api(?)) to retrieve and parse JASPAR PWM's. I have a need to get some PWMs for transcription factors and am slicing/dicing the files I've downloaded from JASPAR. Since I'm in the middle of dealing with that, I was wondering if it was worth being a bit more careful with my code and perhaps whipping up a jaspaR package of sorts that makes this data available via some bioc-friendly code. Cheers, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Transcription Cancer Transcription Cancer • 1.0k views
ADD COMMENT
0
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 9 months ago
United States
I am not sure if this helps: Below is a parser function that I used in the past to import their PWMs from: http://jaspar.genereg.net/html/DOWNLOAD/all_data/matrix_only/matrix_on ly.txt It stores the PWMs in a list from where they can be passed on to the Biostrings' matchPWM function... ## Import function importJaspar <- function(file=myloc) { vec <- readLines(file) vec <- gsub("\\[|\\]", "", vec) start <- grep(">", vec); end <- grep(">", vec) - 1 pos <- data.frame(start=start, end=c(end[-1], length(vec))) pwm <- sapply(seq(along=pos[,1]), function(x) vec[pos[x,1]:pos[x,2]]) pwm <- sapply(seq(along=pwm), function(x) strsplit(pwm[[x]], " {1,}")) pwm <- sapply(seq(along=start), function(x) matrix(as.numeric(t(as.data.frame(pwm[(pos[x,1]+1):pos[x,2]]))[,-1]), nrow=4, dimnames=list(c("A", "C", "G", "T"), NULL))) names(pwm) <- gsub(">", "", vec[start]) return(pwm) } pwm <- importJaspar(file="http://jaspar.genereg.net/html/DOWNLOAD/all_ data/matrix_only/matrix_only.txt") pwmnorm <- sapply(names(pwm), function(x) apply(pwm[[x]], 2, function(y) y/sum(y))) Best, Thomas On Fri, Jun 25, 2010 at 01:47:30PM -0400, Steve Lianoglou wrote: > Howdy, > > I was curious if there are any packages or other means (some web > api(?)) to retrieve and parse JASPAR PWM's. > > I have a need to get some PWMs for transcription factors and am > slicing/dicing the files I've downloaded from JASPAR. > > Since I'm in the middle of dealing with that, I was wondering if it > was worth being a bit more careful with my code and perhaps whipping > up a jaspaR package of sorts that makes this data available via some > bioc-friendly code. > > Cheers, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
You should check out MotIV. It contains JASPAR 2010 and you can use it to mach your list of PWMs to the list of motifs in there. We are also open to suggestions that might improve the package. Raphael On 2010-06-25, at 3:20 PM, Thomas Girke wrote: > I am not sure if this helps: > > Below is a parser function that I used in the past to import their PWMs from: > http://jaspar.genereg.net/html/DOWNLOAD/all_data/matrix_only/matrix_ only.txt > It stores the PWMs in a list from where they can be passed on to the Biostrings' > matchPWM function... > > ## Import function > importJaspar <- function(file=myloc) { > vec <- readLines(file) > vec <- gsub("\\[|\\]", "", vec) > start <- grep(">", vec); end <- grep(">", vec) - 1 > pos <- data.frame(start=start, end=c(end[-1], length(vec))) > pwm <- sapply(seq(along=pos[,1]), function(x) vec[pos[x,1]:pos[x,2]]) > pwm <- sapply(seq(along=pwm), function(x) strsplit(pwm[[x]], " {1,}")) > pwm <- sapply(seq(along=start), function(x) matrix(as.numeric(t(as.data.frame(pwm[(pos[x,1]+1):pos[x,2]]))[,-1]), nrow=4, dimnames=list(c("A", "C", "G", "T"), NULL))) > names(pwm) <- gsub(">", "", vec[start]) > return(pwm) > } > pwm <- importJaspar(file="http://jaspar.genereg.net/html/DOWNLOAD/al l_data/matrix_only/matrix_only.txt") > pwmnorm <- sapply(names(pwm), function(x) apply(pwm[[x]], 2, function(y) y/sum(y))) > > > Best, > > Thomas > > On Fri, Jun 25, 2010 at 01:47:30PM -0400, Steve Lianoglou wrote: >> Howdy, >> >> I was curious if there are any packages or other means (some web >> api(?)) to retrieve and parse JASPAR PWM's. >> >> I have a need to get some PWMs for transcription factors and am >> slicing/dicing the files I've downloaded from JASPAR. >> >> Since I'm in the middle of dealing with that, I was wondering if it >> was worth being a bit more careful with my code and perhaps whipping >> up a jaspaR package of sorts that makes this data available via some >> bioc-friendly code. >> >> Cheers, >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6