On Wed, Oct 7, 2009 at 12:08 PM, Tim Smith <tim_smith_666@yahoo.com>
wrote:
> Hi all,
>
> I wanted to get the trans factor sites that affect a set of genes.
Is there
> any package in bioconductor that will enable me to do this?
>
I don't know of any package that does this directly, but here are some
tips.
If you have access to the (not free) transfac database, these
functions will
read in the database and profile (PRF) files:
readTransFac <- function(con) {
getField <- function(name) {
name <- paste("^", name, sep = "")
sub(paste(name, "(.*)$"), "
\1", lines[grep(name, lines)])
}
lines <- readLines(con)
nms <- getField("ID")
npos <- getField("MATR_LENGTH")
mats <- sub("^[0-9]*", "", gsub("[ACGT]:", "", lines[grep("^[1-9]",
lines)]))
f <- file()
writeLines(mats, f)
mattab <- as.matrix(read.table(f, col.names = c("A", "C", "G",
"T")))
close(f)
matlist <- split.data.frame(mattab, rep(seq_along(npos),
as.integer(npos)))
matlist <- lapply(matlist, t) ## OUCH -- slow step
names(matlist) <- nms
attr(matlist, "labels") <- getField("NA")
attr(matlist, "threshold") <- getField("THRESHOLD")
matlist
}
readPRF <- function(con) {
read.table(con, skip = 4, comment.char = "/",
col.names = c("A", "B", "cutoff", "AC", "ID"))
}
You can use these like this:
> transfac <- readTransFac("transfac/matrixTFP92.lib")
> muscle <- readPRF("transfac/muscle_specific.prf")
> pwm <- transfac[as.character(muscle$ID)]
Then 'pwm' is a list of matrices. You can then find the hits to a
genome
using Biostrings:
> hits <- matchPWM(pwm[[1]], Hsapiens[[1]], "90%")
Now 'hits' represents the hits of the first PWM against Human
chromosome 1,
at 90% cutoff.
You can convert that to an IRanges object:
> ir <- as(hits, "IRanges")
And then use that with the overlap() function in IRanges, along with
some
gene annotations, like those from the GenomicFeatures package (an
experimental data package) to find associations with genes.
> library(GenomicFeatures)
> data(geneHuman)
> trans <- transcripts(geneHuman)
> hitsInPromoters <- ir[trans[1]$promoter]
To find the promoter (+/- 500bp from TSS) hits on chr1.
Most of this code is not tested, but it should serve as a nice
outline.
Michael
thanks!
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]