Trans sites associated with a gene (Transfac database?)
1
0
Entering edit mode
Tim Smith ★ 1.1k
@tim-smith-1532
Last seen 10.3 years ago
Hi all, I wanted to get the trans factor sites that affect a set of genes. Is there any package in bioconductor that will enable me to do this? thanks! [[alternative HTML version deleted]]
• 1.8k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-2759
Last seen 10.3 years ago
On Wed, Oct 7, 2009 at 12:08 PM, Tim Smith <tim_smith_666@yahoo.com> wrote: > Hi all, > > I wanted to get the trans factor sites that affect a set of genes. Is there > any package in bioconductor that will enable me to do this? > I don't know of any package that does this directly, but here are some tips. If you have access to the (not free) transfac database, these functions will read in the database and profile (PRF) files: readTransFac <- function(con) { getField <- function(name) { name <- paste("^", name, sep = "") sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)]) } lines <- readLines(con) nms <- getField("ID") npos <- getField("MATR_LENGTH") mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", lines)])) f <- file() writeLines(mats, f) mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T"))) close(f) matlist <- split.data.frame(mattab, rep(seq_along(npos), as.integer(npos))) matlist <- lapply(matlist, t) ## OUCH -- slow step names(matlist) <- nms attr(matlist, "labels") <- getField("NA") attr(matlist, "threshold") <- getField("THRESHOLD") matlist } readPRF <- function(con) { read.table(con, skip = 4, comment.char = "/", col.names = c("A", "B", "cutoff", "AC", "ID")) } You can use these like this: > transfac <- readTransFac("transfac/matrixTFP92.lib") > muscle <- readPRF("transfac/muscle_specific.prf") > pwm <- transfac[as.character(muscle$ID)] Then 'pwm' is a list of matrices. You can then find the hits to a genome using Biostrings: > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") Now 'hits' represents the hits of the first PWM against Human chromosome 1, at 90% cutoff. You can convert that to an IRanges object: > ir <- as(hits, "IRanges") And then use that with the overlap() function in IRanges, along with some gene annotations, like those from the GenomicFeatures package (an experimental data package) to find associations with genes. > library(GenomicFeatures) > data(geneHuman) > trans <- transcripts(geneHuman) > hitsInPromoters <- ir[trans[1]$promoter] To find the promoter (+/- 500bp from TSS) hits on chr1. Most of this code is not tested, but it should serve as a nice outline. Michael thanks! > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thanks Michael. I hadn't realized that Transfac is not free. Do you know of any free databases that might work? -- thanks! ________________________________ From: Michael Lawrence <mflawren@fhcrc.org> Cc: bioc <bioconductor@stat.math.ethz.ch> Sent: Wed, October 7, 2009 6:33:57 PM Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) >Hi all, > >>I wanted to get the trans factor sites that affect a set of genes. Is there any package in bioconductor that will enable me to do this? > I don't know of any package that does this directly, but here are some tips. If you have access to the (not free) transfac database, these functions will read in the database and profile (PRF) files: readTransFac <- function(con) { getField <- function(name) { name <- paste("^", name, sep = "") sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)]) } lines <- readLines(con) nms <- getField("ID") npos <- getField("MATR_LENGTH") mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", lines)])) f <- file() writeLines(mats, f) mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T"))) close(f) matlist <- split.data.frame(mattab, rep(seq_along(npos), as.integer(npos))) matlist <- lapply(matlist, t) ## OUCH -- slow step names(matlist) <- nms attr(matlist, "labels") <- getField("NA") attr(matlist, "threshold") <- getField("THRESHOLD") matlist } readPRF <- function(con) { read.table(con, skip = 4, comment.char = "/", col.names = c("A", "B", "cutoff", "AC", "ID")) } You can use these like this: > transfac <- readTransFac("transfac/matrixTFP92.lib") > muscle <- readPRF("transfac/muscle_specific.prf") > pwm <- transfac[as.character(muscle$ID)] Then 'pwm' is a list of matrices. You can then find the hits to a genome using Biostrings: > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") Now 'hits' represents the hits of the first PWM against Human chromosome 1, at 90% cutoff. You can convert that to an IRanges object: > ir <- as(hits, "IRanges") And then use that with the overlap() function in IRanges, along with some gene annotations, like those from the GenomicFeatures package (an experimental data package) to find associations with genes. > library(GenomicFeatures) > data(geneHuman) > trans <- transcripts(geneHuman) > hitsInPromoters <- ir[trans[1]$promoter] To find the promoter (+/- 500bp from TSS) hits on chr1. Most of this code is not tested, but it should serve as a nice outline. Michael >thanks! > > > >> [[alternative HTML version deleted]] > >>_______________________________________________ >>Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Thu, Oct 8, 2009 at 9:00 AM, Tim Smith <tim_smith_666@yahoo.com> wrote: > > > Thanks Michael. I hadn't realized that Transfac is not free. Do you know of > any free databases that might work? -- thanks! > > You might try: http://jaspar.genereg.net/ Sean > >Hi all, > > > >>I wanted to get the trans factor sites that affect a set of genes. Is > there any package in bioconductor that will enable me to do this? > > > > I don't know of any package that does this directly, but here are some > tips. > > If you have access to the (not free) transfac database, these functions > will read in the database and profile (PRF) files: > > readTransFac <- function(con) { > getField <- function(name) { > name <- paste("^", name, sep = "") > sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)]) > } > lines <- readLines(con) > nms <- getField("ID") > npos <- getField("MATR_LENGTH") > mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", > lines)])) > f <- file() > writeLines(mats, f) > mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T"))) > close(f) > matlist <- split.data.frame(mattab, rep(seq_along(npos), > as.integer(npos))) > matlist <- lapply(matlist, t) ## OUCH -- slow step > names(matlist) <- nms > attr(matlist, "labels") <- getField("NA") > attr(matlist, "threshold") <- getField("THRESHOLD") > matlist > } > > readPRF <- function(con) { > read.table(con, skip = 4, comment.char = "/", > col.names = c("A", "B", "cutoff", "AC", "ID")) > } > > You can use these like this: > > > transfac <- readTransFac("transfac/matrixTFP92.lib") > > muscle <- readPRF("transfac/muscle_specific.prf") > > pwm <- transfac[as.character(muscle$ID)] > > Then 'pwm' is a list of matrices. You can then find the hits to a genome > using Biostrings: > > > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") > > Now 'hits' represents the hits of the first PWM against Human chromosome 1, > at 90% cutoff. > > You can convert that to an IRanges object: > > > ir <- as(hits, "IRanges") > > And then use that with the overlap() function in IRanges, along with some > gene annotations, like those from the GenomicFeatures package (an > experimental data package) to find associations with genes. > > > library(GenomicFeatures) > > data(geneHuman) > > trans <- transcripts(geneHuman) > > hitsInPromoters <- ir[trans[1]$promoter] > > To find the promoter (+/- 500bp from TSS) hits on chr1. > > Most of this code is not tested, but it should serve as a nice outline. > > Michael > > > > >thanks! > > > > > > > >> [[alternative HTML version deleted]] > > > >>_______________________________________________ > >>Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >>Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
The free Transfac is very old. BioBase distributes current versions via subscription. Have you looked at Jaspar? http://jaspar.cgb.ki.se/. I use that with Emboss. David -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Tim Smith Sent: Thursday, October 08, 2009 9:00 AM To: Michael Lawrence Cc: bioc Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) Thanks Michael. I hadn't realized that Transfac is not free. Do you know of any free databases that might work? -- thanks! ________________________________ From: Michael Lawrence <mflawren@fhcrc.org> Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> Sent: Wed, October 7, 2009 6:33:57 PM Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) >Hi all, > >>I wanted to get the trans factor sites that affect a set of genes. Is there any package in bioconductor that will enable me to do this? > I don't know of any package that does this directly, but here are some tips. If you have access to the (not free) transfac database, these functions will read in the database and profile (PRF) files: readTransFac <- function(con) { getField <- function(name) { name <- paste("^", name, sep = "") sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)]) } lines <- readLines(con) nms <- getField("ID") npos <- getField("MATR_LENGTH") mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", lines)])) f <- file() writeLines(mats, f) mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T"))) close(f) matlist <- split.data.frame(mattab, rep(seq_along(npos), as.integer(npos))) matlist <- lapply(matlist, t) ## OUCH -- slow step names(matlist) <- nms attr(matlist, "labels") <- getField("NA") attr(matlist, "threshold") <- getField("THRESHOLD") matlist } readPRF <- function(con) { read.table(con, skip = 4, comment.char = "/", col.names = c("A", "B", "cutoff", "AC", "ID")) } You can use these like this: > transfac <- readTransFac("transfac/matrixTFP92.lib") > muscle <- readPRF("transfac/muscle_specific.prf") > pwm <- transfac[as.character(muscle$ID)] Then 'pwm' is a list of matrices. You can then find the hits to a genome using Biostrings: > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") Now 'hits' represents the hits of the first PWM against Human chromosome 1, at 90% cutoff. You can convert that to an IRanges object: > ir <- as(hits, "IRanges") And then use that with the overlap() function in IRanges, along with some gene annotations, like those from the GenomicFeatures package (an experimental data package) to find associations with genes. > library(GenomicFeatures) > data(geneHuman) > trans <- transcripts(geneHuman) > hitsInPromoters <- ir[trans[1]$promoter] To find the promoter (+/- 500bp from TSS) hits on chr1. Most of this code is not tested, but it should serve as a nice outline. Michael >thanks! > > > >> [[alternative HTML version deleted]] > >>_______________________________________________ >>Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Tim You could try the tools at http://www.dcode.org. I have used the DiRE tool previously to find TF sites associated with a gene list (well TF sites enriched in a genelist). As far as I can remember they use TRANSFAC Pro (ie non-free) as their TF database. Cheers Iain --- On Thu, 8/10/09, Tim Smith <tim_smith_666@yahoo.com> wrote: From: Tim Smith <tim_smith_666@yahoo.com> Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) To: "Michael Lawrence" <mflawren@fhcrc.org> Cc: "bioc" <bioconductor@stat.math.ethz.ch> Date: Thursday, 8 October, 2009, 2:00 PM Thanks Michael. I hadn't realized that Transfac is not free. Do you know of  any free databases that might work?   -- thanks! ________________________________ From: Michael Lawrence <mflawren@fhcrc.org> Cc: bioc <bioconductor@stat.math.ethz.ch> Sent: Wed, October 7, 2009 6:33:57 PM Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) >Hi all, > >>I wanted to get the trans factor sites that affect a set of genes. Is there any package in bioconductor that will enable me to do this? > I don't know of any package that does this directly, but here are some tips. If you have access to the (not free) transfac database, these functions will read in the database and profile (PRF) files: readTransFac <- function(con) {   getField <- function(name) {     name <- paste("^", name, sep = "")     sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)])   }   lines <- readLines(con)   nms <- getField("ID")   npos <- getField("MATR_LENGTH")   mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", lines)]))   f <- file()   writeLines(mats, f)   mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T")))   close(f)   matlist <- split.data.frame(mattab, rep(seq_along(npos), as.integer(npos)))   matlist <- lapply(matlist, t) ## OUCH -- slow step   names(matlist) <- nms   attr(matlist, "labels") <- getField("NA")   attr(matlist, "threshold") <- getField("THRESHOLD")   matlist } readPRF <- function(con) {   read.table(con, skip = 4, comment.char = "/",              col.names = c("A", "B", "cutoff", "AC", "ID")) } You can use these like this: > transfac <- readTransFac("transfac/matrixTFP92.lib") > muscle <- readPRF("transfac/muscle_specific.prf") > pwm <- transfac[as.character(muscle$ID)] Then 'pwm' is a list of matrices. You can then find the hits to a genome using Biostrings: > hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") Now 'hits' represents the hits of the first PWM against Human chromosome 1, at 90% cutoff. You can convert that to an IRanges object: > ir <- as(hits, "IRanges") And then use that with the overlap() function in IRanges, along with some gene annotations, like those from the GenomicFeatures package (an experimental data package) to find associations with genes. > library(GenomicFeatures) > data(geneHuman) > trans <- transcripts(geneHuman) > hitsInPromoters <- ir[trans[1]$promoter] To find the promoter (+/- 500bp from TSS) hits on chr1. Most of this code is not tested, but it should serve as a nice outline. Michael >thanks! > > > >>        [[alternative HTML version deleted]] > >>_______________________________________________ >>Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >     [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thanks all ! I'll try the suggested sites. Thank you all very much! ________________________________ From: Iain Gallagher <iaingallagher@btopenworld.com> Cc: bioconductor@stat.math.ethz.ch Sent: Thu, October 8, 2009 11:19:07 AM Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) Hi Tim You could try the tools at http://www.dcode.org. I have used the DiRE tool previously to find TF sites associated with a gene list (well TF sites enriched in a genelist). As far as I can remember they use TRANSFAC Pro (ie non-free) as their TF database. Cheers Iain >Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) >To: "Michael Lawrence" <mflawren@fhcrc.org> >Cc: "bioc" <bioconductor@stat.math.ethz.ch> >Date: Thursday, 8 October, 2009, 2:00 PM > > > > >Thanks Michael. I hadn't realized that Transfac is not free. [[elided Yahoo spam]] > > >________________________________ >From: Michael Lawrence <mflawren@fhcrc.org> > >Cc: bioc <bioconductor@stat.math.ethz.ch> >Sent: Wed, October 7, 2009 6:33:57 PM >Subject: Re: [BioC] Trans sites associated with a gene (Transfac database?) > > > > > > >>Hi all, >> >>>I wanted to get the trans factor sites that affect a set of genes. Is there any package in bioconductor that will enable me to do this? >> > >I don't know of any package that does this directly, but here are some tips. > >If you have access to the (not free) transfac database, these functions will read in the database and profile (PRF) > files: > >readTransFac <- function(con) { > getField <- function(name) { > name <- paste("^", name, sep = "") > sub(paste(name, "(.*)$"), "\\1", lines[grep(name, lines)]) > } > lines <- readLines(con) > nms <- getField("ID") > npos <- getField("MATR_LENGTH") > mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]", lines)])) > f <- file() > writeLines(mats, f) > mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G", "T"))) > close(f) > matlist <- split.data.frame(mattab, rep(seq_along(npos), as.integer(npos))) > matlist <- lapply(matlist, t) ## OUCH -- slow step > names(matlist) <- nms > attr(matlist, "labels") <- getField("NA") > attr(matlist, "threshold") <- getField("THRESHOLD") > matlist >} > >readPRF <- function(con) > { > read.table(con, skip = 4, comment.char = "/", > col.names = c("A", "B", "cutoff", "AC", "ID")) >} > >You can use these like this: > >> transfac <- readTransFac("transfac/matrixTFP92.lib") >> muscle <- readPRF("transfac/muscle_specific.prf") >> pwm <- transfac[as.character(muscle$ID)] > >Then 'pwm' is a list of matrices. You can then find the hits to a genome using Biostrings: > >> hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%") > >Now 'hits' represents the hits of the first PWM against Human chromosome 1, at 90% cutoff. > >You can convert that to an IRanges object: > >> ir <- as(hits, "IRanges") > >And then use that with the overlap() function in IRanges, along with some gene annotations, like those from the GenomicFeatures package (an experimental data package) to find associations with genes. > >> > library(GenomicFeatures) >> data(geneHuman) >> trans <- transcripts(geneHuman) >> hitsInPromoters <- ir[trans[1]$promoter] > >To find the promoter (+/- 500bp from TSS) hits on chr1. > >Most of this code is not tested, but it should serve as a nice outline. > >Michael > > > >>thanks! >> >> >> >>> [[alternative HTML version deleted]] >> >>>_______________________________________________ >>>Bioconductor mailing list >>Bioconductor@stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6