RMAPPER and whole genome TFBS information
1
0
Entering edit mode
Ravi Karra ▴ 140
@ravi-karra-4463
Last seen 9.6 years ago
Hello, I am trying to identify all putative GATA binding sites in the mouse genome. Ideally, I want to get genomic coordinates for each "binding site" to enter into a GenomicRanges object (I know there will be a lot of hits) and to overlay this information with the results of a ChIP- Seq experiment. Seems that there are multiple packages to try and do this with, but only RMAPPER allows an interface with the TRANSFAC and Jaspar TF binding site models. I have been getting multiple errors that I am not sure how to resolve. Is this package the best way to get the information I want? Is there a better alternative? Is there an upper limit to the MAPPER query? Thanks for your help, Ravi #load the necessary libraries library (RMAPPER) library (biomaRt) #Compute the mouse genome #get identifiers to be input into MAPPER mm = useMart (biomart = "ensembl", dataset = "mmusculus_gene_ensembl") mmGenes = getBM (attributes = c ("ensembl_gene_id", "external_gene_id", "entrezgene", "external_transcript_id"), mart = mm) #get list of all entrez gene id's egids = unique (mmGenes$entrezgene); egids = egids [2:length (egids)] #first id is NA #make a list of all geneids eglist = paste (egids [500:550], collapse = ",") #get the factor models gata = "M00789, T02689, T00311, T00306, T00305, T00267, T00305, T00267, T00306, T00311, M00632, M00462, MA0037" #Run MAPPER with 50 genes gatah = readMAPPER (gene = eglist, models = gata, org = "Mm", pbases = 5000) >Error in file(con, "r") : cannot open the connection In addition: Warning message: In file(con, "r") : cannot open: HTTP status was '0 (null)' #Run MAPPER with 10 genes eglist = paste (egids [500:510], collapse = ",") gatah = readMAPPER (gene = eglist, models = gata, org = "Mm", pbases = 5000) > Error in seq.default(1, nh * 4, 4) : wrong sign in 'by' argument > traceback () 10: stop("wrong sign in 'by' argument") 9: seq.default(1, nh * 4, 4) 8: seq(1, nh * 4, 4) 7: `[.data.frame`(df, seq(1, nh * 4, 4), ) 6: df[seq(1, nh * 4, 4), ] 5: reshapeMapper(tmp) 4: initialize(value, ...) 3: initialize(value, ...) 2: new("mapperHits", query = sett, hits = reshapeMapper(tmp)) 1: readMAPPER(gene = eglist, models = gata, org = "Mm", pbases = 5000) > sessionInfo () R version 2.13.0 (2011-04-13) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.8.0 RMAPPER_1.2.0 loaded via a namespace (and not attached): [1] RCurl_1.5-0 tools_2.13.0 XML_3.2-0 [[alternative HTML version deleted]]
RMAPPER GenomicRanges RMAPPER GenomicRanges • 992 views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 5 weeks ago
United States
I am listed as the author of this package, and indeed some years ago I wrote the R code that interfaces to the XML-RPC of MAPPER database. I don't know exactly why you are seeing the error that you are seeing, and as far as I can tell your inputs meet the requirement of the rmapperHelp() server-generated documentation. I registered to use the database manually and created a query that it processed as Gene: Trp53rk (transformation related protein 53 regulating kinase) Gene ID: 76367 mRNA accession: NM_023815 Organism: Mus musculus Scanned region: chr2:166617267-166626993 (click to download) Models: JASPAR matrices, TRANSFAC matrices, M00789 This yielded over 2400 hits, for example: Gene GeneID Transcript Factor Name(s) Strand Chrom Start End Region Score E-value Trp53rk 76367 NM_023815 M00791 HNF3 + chr2 166,617,268 166,617,279 Promoter 4.6 14 Trp53rk 76367 NM_023815 MA0041 Foxd3 - chr2 166,617,269 166,617,279 Promoter 2.9 11 Trp53rk 76367 NM_023815 MA0047 Foxa2 - chr2 166,617,269 166,617,280 Promoter 3.9 4.3 with further details on first hit Trp53rk 76367 NM_023815 M00791 HNF3 + chr2 166,617,268 166,617,279 Promoter 4.6 14 Gene: Trp53rk Factor: HNF3 Position (abs): chr2:166,617,268-166,617,279 Gene ID: 76367 Model: M00791 Position (tx): -1999 to -1988 mRNA: NM_023815 Alignment: *->taaacaaAca.a<-* t+ acaaA+a + TGTACAAATAtT Position (cds): -2045 to -2034 ENSEMBL: ENSMUSG00000042 Score: 4.6 E-value: 14 Gene region: Promoter Strand: + Conserved: - in principle RMAPPER will return all such information. However when I try to pass the related query information to readMapper function, I get a success code but just a header back -- no hit data is returned. Specifically > readMAPPER(gene="Trp53rk", models="M00789",org = "Mm", pbases = 2000) Error in seq.default(1, nh * 4, 4) : wrong sign in 'by' argument Enter a frame number, or 0 to exit 1: readMAPPER(gene = "Trp53rk", models = "M00789", org = "Mm", pbases = 2000) 2: new("mapperHits", query = sett, hits = reshapeMapper(tmp)) 3: initialize(value, ...) 4: initialize(value, ...) 5: reshapeMapper(tmp) 6: df[seq(1, nh * 4, 4), ] 7: `[.data.frame`(df, seq(1, nh * 4, 4), ) 8: seq(1, nh * 4, 4) 9: seq.default(1, nh * 4, 4) So I suggest you contact the maintainers. I will carbon them on this note. R version 2.13.0 Patched (2011-04-14 r55443) Platform: x86_64-apple-darwin10.6.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] org.Mm.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5 [4] AnnotationDbi_1.13.21 Biobase_2.11.10 biomaRt_2.8.0 [7] RMAPPER_1.3.0 weaver_1.17.0 codetools_0.2-8 [10] digest_0.4.2 loaded via a namespace (and not attached): [1] RCurl_1.5-0 XML_3.2-0 On Sat, Apr 16, 2011 at 10:41 PM, Ravi Karra <ravi.karra at="" gmail.com=""> wrote: > Hello, > > I am trying to identify all putative GATA binding sites in the mouse genome. ?Ideally, I want to get genomic coordinates for each "binding site" to enter into a GenomicRanges object (I know there will be a lot of hits) and to overlay this information with the results of a ChIP- Seq experiment. Seems that there are multiple packages to try and do this with, but only RMAPPER allows an interface with the TRANSFAC and Jaspar TF binding site models. > I have been getting multiple errors that I am not sure how to resolve. ?Is this package the best way to get the information I want? ?Is there a better alternative? ?Is there an upper limit to the MAPPER query? > > Thanks for your help, > Ravi > > #load the necessary libraries > library (RMAPPER) > library (biomaRt) > > #Compute the mouse genome > #get identifiers to be input into MAPPER > mm = useMart (biomart = "ensembl", dataset = "mmusculus_gene_ensembl") > mmGenes = getBM (attributes = c ("ensembl_gene_id", "external_gene_id", "entrezgene", "external_transcript_id"), mart = mm) > #get list of all entrez gene id's > egids = unique (mmGenes$entrezgene); egids = egids [2:length (egids)] #first id is NA > > #make a list of all geneids > eglist = paste (egids [500:550], collapse = ",") > > #get the factor models > gata = "M00789, T02689, T00311, T00306, T00305, T00267, T00305, T00267, T00306, T00311, M00632, M00462, MA0037" > > #Run MAPPER with 50 genes > gatah = readMAPPER (gene = eglist, models = gata, org = "Mm", pbases = 5000) > >>Error in file(con, "r") : cannot open the connection > In addition: Warning message: > In file(con, "r") : cannot open: HTTP status was '0 (null)' > > #Run MAPPER with 10 genes > eglist = paste (egids [500:510], collapse = ",") > gatah = readMAPPER (gene = eglist, models = gata, org = "Mm", pbases = 5000) > >> Error in seq.default(1, nh * 4, 4) : wrong sign in 'by' argument > > >> traceback () > 10: stop("wrong sign in 'by' argument") > 9: seq.default(1, nh * 4, 4) > 8: seq(1, nh * 4, 4) > 7: `[.data.frame`(df, seq(1, nh * 4, 4), ) > 6: df[seq(1, nh * 4, 4), ] > 5: reshapeMapper(tmp) > 4: initialize(value, ...) > 3: initialize(value, ...) > 2: new("mapperHits", query = sett, hits = reshapeMapper(tmp)) > 1: readMAPPER(gene = eglist, models = gata, org = "Mm", pbases = 5000) > >> sessionInfo () > R version 2.13.0 (2011-04-13) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] biomaRt_2.8.0 RMAPPER_1.2.0 > > loaded via a namespace (and not attached): > [1] RCurl_1.5-0 ?tools_2.13.0 XML_3.2-0 > > > > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6