Question: Trans sites associated with a gene (Transfac database?)
0
gravatar for Tim Smith
10.0 years ago by
Tim Smith1.1k
Tim Smith1.1k wrote:
Hi all, I wanted to get the trans factor sites that affect a set of genes. Is there any package in bioconductor that will enable me to do this? thanks! [[alternative HTML version deleted]]
• 776 views
ADD COMMENTlink modified 10.0 years ago by Michael Lawrence620 • written 10.0 years ago by Tim Smith1.1k
Answer: Trans sites associated with a gene (Transfac database?)
0
gravatar for Michael Lawrence
10.0 years ago by
Michael Lawrence620 wrote:
On Wed, Oct 7, 2009 at 12:08 PM, Tim Smith <tim_smith_666@yahoo.com> wrote: > Hi all, > > I wanted to get the trans factor sites that affect a set of genes. Is there > any package in bioconductor that will enable me to do this? > I don't know of any package that does this directly, but here are some tips. If you have access to the (not free) transfac database, these functions will read in the database and profile (PRF) files: readTransFac <- function(con) { getField <- function(name) { name <- paste("^", name, sep = "") sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)]) } lines <- readLines(con) nms <- getField("ID") npos <- getField("MATR_LENGTH") mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", lines)])) f <- file() writeLines(mats, f) mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T"))) close(f) matlist <- split.data.frame(mattab, rep(seq_along(npos), as.integer(npos))) matlist <- lapply(matlist, t) ## OUCH -- slow step names(matlist) <- nms attr(matlist, "labels") <- getField("NA") attr(matlist, "threshold") <- getField("THRESHOLD") matlist } readPRF <- function(con) { read.table(con, skip = 4, comment.char = "/", col.names = c("A", "B", "cutoff", "AC", "ID")) } You can use these like this: > transfac <- readTransFac("transfac/matrixTFP92.lib") > muscle <- readPRF("transfac/muscle_specific.prf") > pwm <- transfac[as.character(muscle$ID)] Then 'pwm' is a list of matrices. You can then find the hits to a genome using Biostrings: > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") Now 'hits' represents the hits of the first PWM against Human chromosome 1, at 90% cutoff. You can convert that to an IRanges object: > ir <- as(hits, "IRanges") And then use that with the overlap() function in IRanges, along with some gene annotations, like those from the GenomicFeatures package (an experimental data package) to find associations with genes. > library(GenomicFeatures) > data(geneHuman) > trans <- transcripts(geneHuman) > hitsInPromoters <- ir[trans[1]$promoter] To find the promoter (+/- 500bp from TSS) hits on chr1. Most of this code is not tested, but it should serve as a nice outline. Michael thanks! > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 10.0 years ago by Michael Lawrence620
Thanks Michael. I hadn't realized that Transfac is not free. Do you know of any free databases that might work? -- thanks! ________________________________ From: Michael Lawrence <mflawren@fhcrc.org> Cc: bioc <bioconductor@stat.math.ethz.ch> Sent: Wed, October 7, 2009 6:33:57 PM Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) >Hi all, > >>I wanted to get the trans factor sites that affect a set of genes. Is there any package in bioconductor that will enable me to do this? > I don't know of any package that does this directly, but here are some tips. If you have access to the (not free) transfac database, these functions will read in the database and profile (PRF) files: readTransFac <- function(con) { getField <- function(name) { name <- paste("^", name, sep = "") sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)]) } lines <- readLines(con) nms <- getField("ID") npos <- getField("MATR_LENGTH") mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", lines)])) f <- file() writeLines(mats, f) mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T"))) close(f) matlist <- split.data.frame(mattab, rep(seq_along(npos), as.integer(npos))) matlist <- lapply(matlist, t) ## OUCH -- slow step names(matlist) <- nms attr(matlist, "labels") <- getField("NA") attr(matlist, "threshold") <- getField("THRESHOLD") matlist } readPRF <- function(con) { read.table(con, skip = 4, comment.char = "/", col.names = c("A", "B", "cutoff", "AC", "ID")) } You can use these like this: > transfac <- readTransFac("transfac/matrixTFP92.lib") > muscle <- readPRF("transfac/muscle_specific.prf") > pwm <- transfac[as.character(muscle$ID)] Then 'pwm' is a list of matrices. You can then find the hits to a genome using Biostrings: > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") Now 'hits' represents the hits of the first PWM against Human chromosome 1, at 90% cutoff. You can convert that to an IRanges object: > ir <- as(hits, "IRanges") And then use that with the overlap() function in IRanges, along with some gene annotations, like those from the GenomicFeatures package (an experimental data package) to find associations with genes. > library(GenomicFeatures) > data(geneHuman) > trans <- transcripts(geneHuman) > hitsInPromoters <- ir[trans[1]$promoter] To find the promoter (+/- 500bp from TSS) hits on chr1. Most of this code is not tested, but it should serve as a nice outline. Michael >thanks! > > > >> [[alternative HTML version deleted]] > >>_______________________________________________ >>Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 10.0 years ago by Tim Smith1.1k
On Thu, Oct 8, 2009 at 9:00 AM, Tim Smith <tim_smith_666@yahoo.com> wrote: > > > Thanks Michael. I hadn't realized that Transfac is not free. Do you know of > any free databases that might work? -- thanks! > > You might try: http://jaspar.genereg.net/ Sean > >Hi all, > > > >>I wanted to get the trans factor sites that affect a set of genes. Is > there any package in bioconductor that will enable me to do this? > > > > I don't know of any package that does this directly, but here are some > tips. > > If you have access to the (not free) transfac database, these functions > will read in the database and profile (PRF) files: > > readTransFac <- function(con) { > getField <- function(name) { > name <- paste("^", name, sep = "") > sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)]) > } > lines <- readLines(con) > nms <- getField("ID") > npos <- getField("MATR_LENGTH") > mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", > lines)])) > f <- file() > writeLines(mats, f) > mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T"))) > close(f) > matlist <- split.data.frame(mattab, rep(seq_along(npos), > as.integer(npos))) > matlist <- lapply(matlist, t) ## OUCH -- slow step > names(matlist) <- nms > attr(matlist, "labels") <- getField("NA") > attr(matlist, "threshold") <- getField("THRESHOLD") > matlist > } > > readPRF <- function(con) { > read.table(con, skip = 4, comment.char = "/", > col.names = c("A", "B", "cutoff", "AC", "ID")) > } > > You can use these like this: > > > transfac <- readTransFac("transfac/matrixTFP92.lib") > > muscle <- readPRF("transfac/muscle_specific.prf") > > pwm <- transfac[as.character(muscle$ID)] > > Then 'pwm' is a list of matrices. You can then find the hits to a genome > using Biostrings: > > > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") > > Now 'hits' represents the hits of the first PWM against Human chromosome 1, > at 90% cutoff. > > You can convert that to an IRanges object: > > > ir <- as(hits, "IRanges") > > And then use that with the overlap() function in IRanges, along with some > gene annotations, like those from the GenomicFeatures package (an > experimental data package) to find associations with genes. > > > library(GenomicFeatures) > > data(geneHuman) > > trans <- transcripts(geneHuman) > > hitsInPromoters <- ir[trans[1]$promoter] > > To find the promoter (+/- 500bp from TSS) hits on chr1. > > Most of this code is not tested, but it should serve as a nice outline. > > Michael > > > > >thanks! > > > > > > > >> [[alternative HTML version deleted]] > > > >>_______________________________________________ > >>Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >>Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 10.0 years ago by Sean Davis21k
The free Transfac is very old. BioBase distributes current versions via subscription. Have you looked at Jaspar? http://jaspar.cgb.ki.se/. I use that with Emboss. David -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Tim Smith Sent: Thursday, October 08, 2009 9:00 AM To: Michael Lawrence Cc: bioc Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) Thanks Michael. I hadn't realized that Transfac is not free. Do you know of any free databases that might work? -- thanks! ________________________________ From: Michael Lawrence <mflawren@fhcrc.org> Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> Sent: Wed, October 7, 2009 6:33:57 PM Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) >Hi all, > >>I wanted to get the trans factor sites that affect a set of genes. Is there any package in bioconductor that will enable me to do this? > I don't know of any package that does this directly, but here are some tips. If you have access to the (not free) transfac database, these functions will read in the database and profile (PRF) files: readTransFac <- function(con) { getField <- function(name) { name <- paste("^", name, sep = "") sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)]) } lines <- readLines(con) nms <- getField("ID") npos <- getField("MATR_LENGTH") mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", lines)])) f <- file() writeLines(mats, f) mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T"))) close(f) matlist <- split.data.frame(mattab, rep(seq_along(npos), as.integer(npos))) matlist <- lapply(matlist, t) ## OUCH -- slow step names(matlist) <- nms attr(matlist, "labels") <- getField("NA") attr(matlist, "threshold") <- getField("THRESHOLD") matlist } readPRF <- function(con) { read.table(con, skip = 4, comment.char = "/", col.names = c("A", "B", "cutoff", "AC", "ID")) } You can use these like this: > transfac <- readTransFac("transfac/matrixTFP92.lib") > muscle <- readPRF("transfac/muscle_specific.prf") > pwm <- transfac[as.character(muscle$ID)] Then 'pwm' is a list of matrices. You can then find the hits to a genome using Biostrings: > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") Now 'hits' represents the hits of the first PWM against Human chromosome 1, at 90% cutoff. You can convert that to an IRanges object: > ir <- as(hits, "IRanges") And then use that with the overlap() function in IRanges, along with some gene annotations, like those from the GenomicFeatures package (an experimental data package) to find associations with genes. > library(GenomicFeatures) > data(geneHuman) > trans <- transcripts(geneHuman) > hitsInPromoters <- ir[trans[1]$promoter] To find the promoter (+/- 500bp from TSS) hits on chr1. Most of this code is not tested, but it should serve as a nice outline. Michael >thanks! > > > >> [[alternative HTML version deleted]] > >>_______________________________________________ >>Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 10.0 years ago by David Lapointe170
Hi Tim You could try the tools at http://www.dcode.org. I have used the DiRE tool previously to find TF sites associated with a gene list (well TF sites enriched in a genelist). As far as I can remember they use TRANSFAC Pro (ie non-free) as their TF database. Cheers Iain --- On Thu, 8/10/09, Tim Smith <tim_smith_666@yahoo.com> wrote: From: Tim Smith <tim_smith_666@yahoo.com> Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) To: "Michael Lawrence" <mflawren@fhcrc.org> Cc: "bioc" <bioconductor@stat.math.ethz.ch> Date: Thursday, 8 October, 2009, 2:00 PM Thanks Michael. I hadn't realized that Transfac is not free. Do you know of  any free databases that might work?   -- thanks! ________________________________ From: Michael Lawrence <mflawren@fhcrc.org> Cc: bioc <bioconductor@stat.math.ethz.ch> Sent: Wed, October 7, 2009 6:33:57 PM Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) >Hi all, > >>I wanted to get the trans factor sites that affect a set of genes. Is there any package in bioconductor that will enable me to do this? > I don't know of any package that does this directly, but here are some tips. If you have access to the (not free) transfac database, these functions will read in the database and profile (PRF) files: readTransFac <- function(con) {   getField <- function(name) {     name <- paste("^", name, sep = "")     sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)])   }   lines <- readLines(con)   nms <- getField("ID")   npos <- getField("MATR_LENGTH")   mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", lines)]))   f <- file()   writeLines(mats, f)   mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T")))   close(f)   matlist <- split.data.frame(mattab, rep(seq_along(npos), as.integer(npos)))   matlist <- lapply(matlist, t) ## OUCH -- slow step   names(matlist) <- nms   attr(matlist, "labels") <- getField("NA")   attr(matlist, "threshold") <- getField("THRESHOLD")   matlist } readPRF <- function(con) {   read.table(con, skip = 4, comment.char = "/",              col.names = c("A", "B", "cutoff", "AC", "ID")) } You can use these like this: > transfac <- readTransFac("transfac/matrixTFP92.lib") > muscle <- readPRF("transfac/muscle_specific.prf") > pwm <- transfac[as.character(muscle$ID)] Then 'pwm' is a list of matrices. You can then find the hits to a genome using Biostrings: > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") Now 'hits' represents the hits of the first PWM against Human chromosome 1, at 90% cutoff. You can convert that to an IRanges object: > ir <- as(hits, "IRanges") And then use that with the overlap() function in IRanges, along with some gene annotations, like those from the GenomicFeatures package (an experimental data package) to find associations with genes. > library(GenomicFeatures) > data(geneHuman) > trans <- transcripts(geneHuman) > hitsInPromoters <- ir[trans[1]$promoter] To find the promoter (+/- 500bp from TSS) hits on chr1. Most of this code is not tested, but it should serve as a nice outline. Michael >thanks! > > > >>        [[alternative HTML version deleted]] > >>_______________________________________________ >>Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >     [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLYlink written 10.0 years ago by Iain Gallagher930
Thanks all ! I'll try the suggested sites. Thank you all very much! ________________________________ From: Iain Gallagher <iaingallagher@btopenworld.com> Cc: bioconductor@stat.math.ethz.ch Sent: Thu, October 8, 2009 11:19:07 AM Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) Hi Tim You could try the tools at http://www.dcode.org. I have used the DiRE tool previously to find TF sites associated with a gene list (well TF sites enriched in a genelist). As far as I can remember they use TRANSFAC Pro (ie non-free) as their TF database. Cheers Iain >Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) >To: "Michael Lawrence" <mflawren@fhcrc.org> >Cc: "bioc" <bioconductor@stat.math.ethz.ch> >Date: Thursday, 8 October, 2009, 2:00 PM > > > > >Thanks Michael. I hadn't realized that Transfac is not free. [[elided Yahoo spam]] > > >________________________________ >From: Michael Lawrence <mflawren@fhcrc.org> > >Cc: bioc <bioconductor@stat.math.ethz.ch> >Sent: Wed, October 7, 2009 6:33:57 PM >Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) > > > > > > >>Hi all, >> >>>I wanted to get the trans factor sites that affect a set of genes. Is there any package in bioconductor that will enable me to do this? >> > >I don't know of any package that does this directly, but here are some tips. > >If you have access to the (not free) transfac database, these functions will read in the database and profile (PRF) > files: > >readTransFac <- function(con) { > getField <- function(name) { > name <- paste("^", name, sep = "") > sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)]) > } > lines <- readLines(con) > nms <- getField("ID") > npos <- getField("MATR_LENGTH") > mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", lines)])) > f <- file() > writeLines(mats, f) > mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T"))) > close(f) > matlist <- split.data.frame(mattab, rep(seq_along(npos), as.integer(npos))) > matlist <- lapply(matlist, t) ## OUCH -- slow step > names(matlist) <- nms > attr(matlist, "labels") <- getField("NA") > attr(matlist, "threshold") <- getField("THRESHOLD") > matlist >} > >readPRF <- function(con) > { > read.table(con, skip = 4, comment.char = "/", > col.names = c("A", "B", "cutoff", "AC", "ID")) >} > >You can use these like this: > >> transfac <- readTransFac("transfac/matrixTFP92.lib") >> muscle <- readPRF("transfac/muscle_specific.prf") >> pwm <- transfac[as.character(muscle$ID)] > >Then 'pwm' is a list of matrices. You can then find the hits to a genome using Biostrings: > >> hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") > >Now 'hits' represents the hits of the first PWM against Human chromosome 1, at 90% cutoff. > >You can convert that to an IRanges object: > >> ir <- as(hits, "IRanges") > >And then use that with the overlap() function in IRanges, along with some gene annotations, like those from the GenomicFeatures package (an experimental data package) to find associations with genes. > >> > library(GenomicFeatures) >> data(geneHuman) >> trans <- transcripts(geneHuman) >> hitsInPromoters <- ir[trans[1]$promoter] > >To find the promoter (+/- 500bp from TSS) hits on chr1. > >Most of this code is not tested, but it should serve as a nice outline. > >Michael > > > >>thanks! >> >> >> >>> [[alternative HTML version deleted]] >> >>>_______________________________________________ >>>Bioconductor mailing list >>Bioconductor@stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 10.0 years ago by Tim Smith1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 314 users visited in the last hour