Question

Problem with bnlearn and Rgraphviz

0

Entering edit mode

hcong@cs.usm.my ▴ 10

@hcongcsusmmy-4207

Last seen 11.3 years ago

Dear Bioconductors, Very sorry for posting this problem. I am a new user to R and Bioconductor but I have problem loading the Rgraphviz package while exploring the bnlearn package in R. I have downloaded and installed graphviz-2.20.3.1 and also graphviz-2.20.3 but still have the error message and problem below. Can anyone help me with this problem? Thank you. Ong Penang, Malaysia. This application has failed to start because libcdt-4.dll was not found. Re-installing the application may fix this problem R version 2.11.1 (2010-05-31) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored] > library(bnlearn) Loading required package: Rgraphviz Loading required package: graph Loading required package: grid Error : .onLoad failed in loadNamespace() for 'Rgraphviz', details: call: value[[3L]](cond) error: unable to load shared library 'C:/PROGRA~1/R/R-211~1.1/library/Rgraphviz/libs/Rgraphviz.dll': LoadLibrary failure: The specified module could not be found. Check that (1) graphviz is installed on your system; (2) the installed version of graphviz matches '2.20.3'; this is the version used to build this Rgraphviz package; (3) graphviz is accessible to R, e.g., the path to the graphviz 'bin' directory is in the system 'PATH' variable. See additional instructions in the 'README' file of the Rgraphviz 'source' distribution, available at http://bioconductor.org/packages/release/bioc/html/Rgraphviz.html Ask further questions on the Bioconductor mailing list http://bioconductor.org/docs/mailList.html Package Rgraphviz not loaded. Loading required package: lattice Package lattice loaded successfully. Attaching package: 'bnlearn' The following object(s) are masked from 'package:graph': nodes -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.

Rgraphviz Rgraphviz • 4.1k views

ADD COMMENT • link updated 15.4 years ago by Martin Morgan 25k • written 15.4 years ago by hcong@cs.usm.my ▴ 10

score 0 · Answer 1 · 2010-08-10

On 08/10/2010 01:12 AM, hcong at cs.usm.my wrote: > Dear Bioconductors, > > Very sorry for posting this problem. I am a new user to R and Bioconductor > but I have problem loading the Rgraphviz package while exploring the > bnlearn package in R. I have downloaded and installed graphviz-2.20.3.1 > and also graphviz-2.20.3 but still have the error message and problem > below. Can anyone help me with this problem? Thank you. >From the email below, you'll need to > (3) graphviz is accessible to R, e.g., > the path to the graphviz 'bin' directory is in the system 'PATH' > variable. This means setting a Windows 'PATH' environment variable (google for 'set windows path varialbe) to include the directory where graphviz' libcdt-4.dll is installed, and verifying that this PATH shows up in an R session with Sys.getenv("PATH") Martin > > Ong > Penang, Malaysia. > > > > This application has failed to start because libcdt-4.dll was not found. > Re-installing the application may fix this problem > > R version 2.11.1 (2010-05-31) > Copyright (C) 2010 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > Natural language support but running in an English locale > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > > [Previously saved workspace restored] > >> library(bnlearn) > Loading required package: Rgraphviz > Loading required package: graph > Loading required package: grid > Error : .onLoad failed in loadNamespace() for 'Rgraphviz', details: > call: value[[3L]](cond) > error: unable to load shared library > 'C:/PROGRA~1/R/R-211~1.1/library/Rgraphviz/libs/Rgraphviz.dll': > LoadLibrary failure: The specified module could not be found. > > > > Check that (1) graphviz is installed on your system; (2) the > installed version of graphviz matches '2.20.3'; this is the version used > to build this Rgraphviz package; (3) graphviz is accessible to R, e.g., > the path to the graphviz 'bin' directory is in the system 'PATH' > variable. See additional instructions in the 'README' file of the > Rgraphviz 'source' distribution, available at > > http://bioconductor.org/packages/release/bioc/html/Rgraphviz.html > > Ask further questions on the Bioconductor mailing list > > http://bioconductor.org/docs/mailList.html > > > Package Rgraphviz not loaded. > Loading required package: lattice > Package lattice loaded successfully. > > Attaching package: 'bnlearn' > > The following object(s) are masked from 'package:graph': > > nodes > > > > > > > -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

score 0 · Answer 2 · 2010-08-10

0

Entering edit mode

Marc Noguera ▴ 100

@marc-noguera-3883

Last seen 11.3 years ago

Hi all, this may be a very naive question by I have been trying to solve it myself and i can't get through it. I have this RangedData object obtained from ChIPpeakAnno Package, which correspond toa Chipseq experiment with annotated peaks, with ENSEMBL identificators. I can use this output already but like to transform the ENSEMBLID to a gene symbol id for instance: ENSMUSG00000025907 to "Rb1cc1" symbol. It also would be useful to add a field linking to a entrez gene web url. I have been looking at the org.Mm.eg.db package and although I can retrieve the symbol for a particular ENSEMBLID can't get it for all the elements in the object. Many thanks Marc -- ----------------------------------------------------- Marc Noguera i Julian, PhD Genomics unit / Bioinformatics Institut de Medicina Predictiva i Personalitzada del C?ncer (IMPPC) B-10 Office Carretera de Can Ruti Cam? de les Escoles s/n 08916 Badalona, Barcelona

ADD COMMENT • link 15.4 years ago Marc Noguera ▴ 100

0

Entering edit mode

Hi Marc, On 8/10/2010 6:26 AM, Marc Noguera wrote: > Hi all, > this may be a very naive question by I have been trying to solve it > myself and i can't get through it. > > I have this RangedData object obtained from ChIPpeakAnno Package, which > correspond toa Chipseq experiment with annotated peaks, with ENSEMBL > identificators. > I can use this output already but like to transform the ENSEMBLID to a > gene symbol id > > for instance: ENSMUSG00000025907 to "Rb1cc1" symbol. It also would be > useful to add a field linking to a entrez gene web url. > > I have been looking at the org.Mm.eg.db package and although I can > retrieve the symbol for a particular ENSEMBLID can't get it for all the > elements in the object. What have you tried so far? Unless you give an example of what you have done and how it didn't perform as you expect, it is very difficult for anybody to help. As a shot in the dark, have you looked at the help page for mget()? I don't really understand how the field linking to Entrez Gene would work, considering a RangedData object isn't an HTML page. However, building a URL to Entrez Gene isn't that difficult. You can hijack some internal code from the annotate package: > suppressMessages(library(annotate)) > egids <- 1:5 > annotate:::.repositories[["en"]](egids) [1] "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&do pt=Graphics&list_uids=1" [2] "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&do pt=Graphics&list_uids=2" [3] "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&do pt=Graphics&list_uids=3" [4] "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&do pt=Graphics&list_uids=4" [5] "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&do pt=Graphics&list_uids=5" But doing it by hand wouldn't be that much more difficult. If you strip out the error checking from the above function, all it really consists of is thefunction <- function(ids){ paste("http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retri eve&dopt=Graphics&list_uids=", ids, sep = "") } Best, Jim > > Many thanks > Marc > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 15.4 years ago James W. MacDonald 68k

0

Entering edit mode

My explanation wasn't quite clear, excuse me. I have a RangedData object, as obtained from ChipPeakAnno, with lots of peaks which are annotated like this: >table <- read.table("H3K4me3vsIgG_peaks.bed") >bed <- BED2RangedData(bed,header=T) >annotatedPeaks<-annotatePeakInBatch(bed,AnnotationData=TSS.mouse.NCBI M37) > annotatedPeaks[1:2,] RangedData with 2 rows and 9 value columns across 21 spaces space ranges | <character> <iranges> | MACS_peak_10 ENSMUSG00000025907 1 [ 6204139, 6204741] | MACS_peak_102 ENSMUSG00000061518 1 [36748352, 36749344] | peak strand feature <character> <character> <character> MACS_peak_10 ENSMUSG00000025907 MACS_peak_10 1 ENSMUSG00000025907 MACS_peak_102 ENSMUSG00000061518 MACS_peak_102 1 ENSMUSG00000061518 start_position end_position insideFeature <numeric> <numeric> <character> MACS_peak_10 ENSMUSG00000025907 6204743 6265656 upstream MACS_peak_102 ENSMUSG00000061518 36748425 36750230 overlapStart distancetoFeature shortestDistance <numeric> <numeric> MACS_peak_10 ENSMUSG00000025907 -604 2 MACS_peak_102 ENSMUSG00000061518 -73 73 fromOverlappingOrNearest <character> MACS_peak_10 ENSMUSG00000025907 NearestStart MACS_peak_102 ENSMUSG00000061518 NearestStart If I do like this: > org.Mm.egENSEMBL2EG$`ENSMUSG00000025907` [1] "12421" > org.Mm.egSYMBOL$`12421` [1] "Rb1cc1" Which is the symbol I am itnerested in. I would like to add the symbol corresponding to each row in the RangedData Object as a new column. I have tried this, with mget: > annotatedPeaks$entrez<-mget(annotatedPeaks$feature,org.Mm.egENSEMBL2EG ,ifnotfound=NA) but: > annotatedPeaks[1:2,] RangedData with 2 rows and 10 value columns across 21 spaces space ranges | <character> <iranges> | MACS_peak_10 ENSMUSG00000025907 1 [ 6204139, 6204741] | MACS_peak_102 ENSMUSG00000061518 1 [36748352, 36749344] | peak strand feature <character> <character> <character> MACS_peak_10 ENSMUSG00000025907 MACS_peak_10 1 ENSMUSG00000025907 MACS_peak_102 ENSMUSG00000061518 MACS_peak_102 1 ENSMUSG00000061518 start_position end_position insideFeature <numeric> <numeric> <character> MACS_peak_10 ENSMUSG00000025907 6204743 6265656 upstream MACS_peak_102 ENSMUSG00000061518 36748425 36750230 overlapStart distancetoFeature shortestDistance <numeric> <numeric> MACS_peak_10 ENSMUSG00000025907 -604 2 MACS_peak_102 ENSMUSG00000061518 -73 73 fromOverlappingOrNearest entrez <character> <list> MACS_peak_10 ENSMUSG00000025907 NearestStart ######## MACS_peak_102 ENSMUSG00000061518 NearestStart ######## as mget returns a list. Note that some ensemble IDs map to more than one gene ID. Also, using the convert2EntrezID from the same ChIPpeakAnno package: > annotatedPeaks$EntrezID<-convert2EntrezID(IDs=annotatedPeaks$feature,o rgAnn="org.Mm.eg.db",ID_type="ensembl_gene_id") Error in `[[<-`(`*tmp*`, name, value = c("12421", "12859", "67387", "623661", : 9633 elements in value to replace 13721 elements > Which returns a matrix (dim 9633,1) as some ensemblID map to same gene ID. As far as I understand I need to get use the geneID to map ensemblID to SYMBOL. So i cannot get the Symbols. So, I am stuck here. Thanks again, Marc > sessionInfo() R version 2.11.0 (2010-04-22) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Mm.eg.db_2.4.1 ChIPpeakAnno_1.4.1 [3] limma_3.4.0 org.Hs.eg.db_2.4.1 [5] GO.db_2.4.1 RSQLite_0.9-0 [7] DBI_0.2-5 AnnotationDbi_1.10.1 [9] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.1 [11] GenomicRanges_1.0.1 Biostrings_2.16.0 [13] IRanges_1.6.1 multtest_2.4.0 [15] Biobase_2.8.0 biomaRt_2.4.0 loaded via a namespace (and not attached): [1] MASS_7.3-5 RCurl_1.4-2 splines_2.11.0 survival_2.35-8 [5] tools_2.11.0 XML_3.1-0 > James W. MacDonald wrote: > Hi Marc, > > On 8/10/2010 6:26 AM, Marc Noguera wrote: > >> Hi all, >> this may be a very naive question by I have been trying to solve it >> myself and i can't get through it. >> >> I have this RangedData object obtained from ChIPpeakAnno Package, which >> correspond toa Chipseq experiment with annotated peaks, with ENSEMBL >> identificators. >> I can use this output already but like to transform the ENSEMBLID to a >> gene symbol id >> >> for instance: ENSMUSG00000025907 to "Rb1cc1" symbol. It also would be >> useful to add a field linking to a entrez gene web url. >> >> I have been looking at the org.Mm.eg.db package and although I can >> retrieve the symbol for a particular ENSEMBLID can't get it for all the >> elements in the object. >> > > What have you tried so far? Unless you give an example of what you have > done and how it didn't perform as you expect, it is very difficult for > anybody to help. > > As a shot in the dark, have you looked at the help page for mget()? > > I don't really understand how the field linking to Entrez Gene would > work, considering a RangedData object isn't an HTML page. However, > building a URL to Entrez Gene isn't that difficult. You can hijack some > internal code from the annotate package: > > > suppressMessages(library(annotate)) > > egids <- 1:5 > > annotate:::.repositories[["en"]](egids) > [1] > "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve& dopt=Graphics&list_uids=1" > [2] > "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve& dopt=Graphics&list_uids=2" > [3] > "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve& dopt=Graphics&list_uids=3" > [4] > "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve& dopt=Graphics&list_uids=4" > [5] > "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve& dopt=Graphics&list_uids=5" > > But doing it by hand wouldn't be that much more difficult. If you strip > out the error checking from the above function, all it really consists of is > > thefunction <- function(ids){ > paste("http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Ret rieve&dopt=Graphics&list_uids=", > > ids, sep = "") > } > > Best, > > Jim > > > >> Many thanks >> Marc >> >> > > -- ----------------------------------------------------- Marc Noguera i Julian, PhD Genomics unit / Bioinformatics Institut de Medicina Predictiva i Personalitzada del C?ncer (IMPPC) B-10 Office Carretera de Can Ruti Cam? de les Escoles s/n 08916 Badalona, Barcelona

ADD REPLY • link 15.4 years ago Marc Noguera ▴ 100

0

Entering edit mode

Hi Marc, On 8/10/2010 10:26 AM, Marc Noguera wrote: > My explanation wasn't quite clear, excuse me. > > I have a RangedData object, as obtained from ChipPeakAnno, with lots of > peaks which are annotated like this: > >> table<- read.table("H3K4me3vsIgG_peaks.bed") >> bed<- BED2RangedData(bed,header=T) >> annotatedPeaks<-annotatePeakInBatch(bed,AnnotationData=TSS.mouse.NC BIM37) >> annotatedPeaks[1:2,] > RangedData with 2 rows and 9 value columns across 21 spaces > space ranges | > <character> <iranges> | > MACS_peak_10 ENSMUSG00000025907 1 [ 6204139, 6204741] | > MACS_peak_102 ENSMUSG00000061518 1 [36748352, 36749344] | > > peak strand feature > > <character> <character> <character> > MACS_peak_10 ENSMUSG00000025907 MACS_peak_10 1 > ENSMUSG00000025907 > MACS_peak_102 ENSMUSG00000061518 MACS_peak_102 1 > ENSMUSG00000061518 > > start_position end_position insideFeature > > <numeric> <numeric> <character> > MACS_peak_10 ENSMUSG00000025907 6204743 6265656 upstream > MACS_peak_102 ENSMUSG00000061518 36748425 36750230 overlapStart > > distancetoFeature shortestDistance > > <numeric> <numeric> > MACS_peak_10 ENSMUSG00000025907 -604 2 > MACS_peak_102 ENSMUSG00000061518 -73 73 > > fromOverlappingOrNearest > > <character> > MACS_peak_10 ENSMUSG00000025907 NearestStart > MACS_peak_102 ENSMUSG00000061518 NearestStart > > If I do like this: >> org.Mm.egENSEMBL2EG$`ENSMUSG00000025907` > [1] "12421" >> org.Mm.egSYMBOL$`12421` > [1] "Rb1cc1" > > Which is the symbol I am itnerested in. I would like to add the symbol > corresponding to each row in the RangedData Object as a new column. > > I have tried this, with mget: > >> > annotatedPeaks$entrez<-mget(annotatedPeaks$feature,org.Mm.egENSEMBL2 EG,ifnotfound=NA) > > but: > >> annotatedPeaks[1:2,] > RangedData with 2 rows and 10 value columns across 21 spaces > space ranges | > <character> <iranges> | > MACS_peak_10 ENSMUSG00000025907 1 [ 6204139, 6204741] | > MACS_peak_102 ENSMUSG00000061518 1 [36748352, 36749344] | > peak strand > feature > <character> <character> > <character> > MACS_peak_10 ENSMUSG00000025907 MACS_peak_10 1 > ENSMUSG00000025907 > MACS_peak_102 ENSMUSG00000061518 MACS_peak_102 1 > ENSMUSG00000061518 > start_position end_position insideFeature > <numeric> <numeric> <character> > MACS_peak_10 ENSMUSG00000025907 6204743 6265656 upstream > MACS_peak_102 ENSMUSG00000061518 36748425 36750230 overlapStart > distancetoFeature shortestDistance > <numeric> <numeric> > MACS_peak_10 ENSMUSG00000025907 -604 2 > MACS_peak_102 ENSMUSG00000061518 -73 73 > fromOverlappingOrNearest entrez > <character> <list> > MACS_peak_10 ENSMUSG00000025907 NearestStart ######## > MACS_peak_102 ENSMUSG00000061518 NearestStart ######## > > as mget returns a list. Note that some ensemble IDs map to more than one > gene ID. Well, that's true and unavoidable. You will simply have to choose which symbol you want to use. Or else, you can concatenate with commas if you want them all. symbs <-mget(annotatedPeaks$feature,org.Mm.egENSEMBL2EG,ifnotfound=NA) symbs <- sapply(symbs, function(x) paste(x, collapse = ", ")) annotatedPeaks$entrez <- symbs And as you already noted, it is a two step process to go from Ensembl gene IDs to gene symbols, so what has been done above isn't quite sufficient. But I assume you get the idea. You can always do things in one shot using SQL queries. An example you can work from can be found in in section 2.0.9 of the AnnotationDbi vignette. That can be fun if you like SQL stuff. Alternatively, you can use biomaRt. Note that you will always want to return the Ensembl gene ID when you use biomaRt, so you can line things up after the fact. As an example: > suppressMessages(library(biomaRt)) > mart <- useMart("ensembl","mmusculus_gene_ensembl") ## fake up some IDs > ids <- paste("ENSMUSG", sprintf("%011.0f", c(1,3,28,37,49,56,58,78,85,88,93,94,103)), sep="") > ids [1] "ENSMUSG00000000001" "ENSMUSG00000000003" "ENSMUSG00000000028" [4] "ENSMUSG00000000037" "ENSMUSG00000000049" "ENSMUSG00000000056" [7] "ENSMUSG00000000058" "ENSMUSG00000000078" "ENSMUSG00000000085" [10] "ENSMUSG00000000088" "ENSMUSG00000000093" "ENSMUSG00000000094" [13] "ENSMUSG00000000103" > out <- getBM(c("ensembl_gene_id", "mgi_symbol"), "ensembl_gene_id", ids, mart) > out ensembl_gene_id mgi_symbol 1 ENSMUSG00000000001 Gnai3 2 ENSMUSG00000000003 Pbsn 3 ENSMUSG00000000028 Cdc45 4 ENSMUSG00000000037 Scml2 5 ENSMUSG00000000049 Apoh 6 ENSMUSG00000000056 Narf 7 ENSMUSG00000000058 Cav2 8 ENSMUSG00000000078 Klf6 9 ENSMUSG00000000085 Scmh1 10 ENSMUSG00000000088 Cox5a 11 ENSMUSG00000000093 Tbx2 12 ENSMUSG00000000094 Tbx4 13 ENSMUSG00000000103 Zfy2 Here if you have any duplicated symbols, you can use the trick from above. First convert to a list: outlist <- tapply(1:nrow(out), out[,1], function(x) out[x,2]) then outlist <- sapply(outlist, function(x) paste(x, collapse = ", ")) and append to your RangedData object. Also note that Ensembl gene IDs that don't map to a symbol will be dropped silently, so you might need to do some manipulations of the returned data.frame to insert the gene IDs (and a '' for the symbol) in order to have things line up correctly when you put the symbols into your RangedData object. Best, Jim > Also, using the convert2EntrezID from the same ChIPpeakAnno package: > >> > annotatedPeaks$EntrezID<-convert2EntrezID(IDs=annotatedPeaks$feature ,orgAnn="org.Mm.eg.db",ID_type="ensembl_gene_id") > > Error in `[[<-`(`*tmp*`, name, value = c("12421", "12859", "67387", > "623661", : > 9633 elements in value to replace 13721 elements >> > > > Which returns a matrix (dim 9633,1) as some ensemblID map to same gene ID. > > As far as I understand I need to get use the geneID to map ensemblID to > SYMBOL. So i cannot get the Symbols. > > So, I am stuck here. > > > Thanks again, > Marc > >> sessionInfo() > R version 2.11.0 (2010-04-22) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] org.Mm.eg.db_2.4.1 ChIPpeakAnno_1.4.1 > [3] limma_3.4.0 org.Hs.eg.db_2.4.1 > [5] GO.db_2.4.1 RSQLite_0.9-0 > [7] DBI_0.2-5 AnnotationDbi_1.10.1 > [9] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.1 > [11] GenomicRanges_1.0.1 Biostrings_2.16.0 > [13] IRanges_1.6.1 multtest_2.4.0 > [15] Biobase_2.8.0 biomaRt_2.4.0 > > loaded via a namespace (and not attached): > [1] MASS_7.3-5 RCurl_1.4-2 splines_2.11.0 survival_2.35-8 > [5] tools_2.11.0 XML_3.1-0 >> > > > > James W. MacDonald wrote: >> Hi Marc, >> >> On 8/10/2010 6:26 AM, Marc Noguera wrote: >> >>> Hi all, >>> this may be a very naive question by I have been trying to solve it >>> myself and i can't get through it. >>> >>> I have this RangedData object obtained from ChIPpeakAnno Package, which >>> correspond toa Chipseq experiment with annotated peaks, with ENSEMBL >>> identificators. >>> I can use this output already but like to transform the ENSEMBLID to a >>> gene symbol id >>> >>> for instance: ENSMUSG00000025907 to "Rb1cc1" symbol. It also would be >>> useful to add a field linking to a entrez gene web url. >>> >>> I have been looking at the org.Mm.eg.db package and although I can >>> retrieve the symbol for a particular ENSEMBLID can't get it for all the >>> elements in the object. >>> >> >> What have you tried so far? Unless you give an example of what you have >> done and how it didn't perform as you expect, it is very difficult for >> anybody to help. >> >> As a shot in the dark, have you looked at the help page for mget()? >> >> I don't really understand how the field linking to Entrez Gene would >> work, considering a RangedData object isn't an HTML page. However, >> building a URL to Entrez Gene isn't that difficult. You can hijack some >> internal code from the annotate package: >> >> > suppressMessages(library(annotate)) >> > egids<- 1:5 >> > annotate:::.repositories[["en"]](egids) >> [1] >> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve &dopt=Graphics&list_uids=1" >> [2] >> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve &dopt=Graphics&list_uids=2" >> [3] >> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve &dopt=Graphics&list_uids=3" >> [4] >> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve &dopt=Graphics&list_uids=4" >> [5] >> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve &dopt=Graphics&list_uids=5" >> >> But doing it by hand wouldn't be that much more difficult. If you strip >> out the error checking from the above function, all it really consists of is >> >> thefunction<- function(ids){ >> paste("http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Re trieve&dopt=Graphics&list_uids=", >> >> ids, sep = "") >> } >> >> Best, >> >> Jim >> >> >> >>> Many thanks >>> Marc >>> >>> >> >> > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 15.4 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks for the detailed explanation. I have tried as indicated, as some other things. Just for the record: > entrez<-mget(annotatedPeaks$feature,org.Mm.egENSEMBL2EG,ifnotfound=NA) > entrez[1:3] $ENSMUSG00000025907 [1] "12421" $ENSMUSG00000061518 [1] "12859" "100046079" $ENSMUSG00000026111 [1] "67387" Note that there are some entries in the list with NA, which are not shown. I still don't want to collapse the inner lists as I need to transform to symbol. Like this: >symbols <- lapply(entrez[!is.na(entrez)],mget,org.Mm.egSYMBOL,ifnotfound=NA) Which gives me a shorter list, with as many entries for each inner list as Entrez id existed. For the same entries: > symbols[1:3] $ENSMUSG00000025907 $ENSMUSG00000025907$`12421` [1] "Rb1cc1" $ENSMUSG00000061518 $ENSMUSG00000061518$`12859` [1] "Cox5b" $ENSMUSG00000061518$`100046079` [1] "LOC100046079" $ENSMUSG00000026111 $ENSMUSG00000026111$`67387` [1] "Unc50" Which I can now collapse: >symbols <- sapply(symbols, function(x) paste(x,collapse=",")) >symbols[1:3] ENSMUSG00000025907 ENSMUSG00000061518 ENSMUSG00000026111 "Rb1cc1" "Cox5b,LOC100046079" "Unc50" Now I want to add symbols to AnnotatedPeaks joined by Ensembl Id (annotatedPeaks$feature), which I do with: >df<-as.data.frame(annotatedPeaks) >df$symbols<-symbols[as.character(df$feature)] > subset(df,T,c(feature,symbols))[1:3,] feature symbols 1 ENSMUSG00000025907 Rb1cc1 2 ENSMUSG00000061518 Cox5b,LOC100046079 3 ENSMUSG00000026111 Unc50 Then I write it to a table. I know that this is probably a very non-optimal way to do it, so any advice about improving it will be appreciated. Thanks for all Marc James W. MacDonald wrote: > Hi Marc, > > On 8/10/2010 10:26 AM, Marc Noguera wrote: > >> My explanation wasn't quite clear, excuse me. >> >> I have a RangedData object, as obtained from ChipPeakAnno, with lots of >> peaks which are annotated like this: >> >> >>> table<- read.table("H3K4me3vsIgG_peaks.bed") >>> bed<- BED2RangedData(bed,header=T) >>> annotatedPeaks<-annotatePeakInBatch(bed,AnnotationData=TSS.mouse.N CBIM37) >>> annotatedPeaks[1:2,] >>> >> RangedData with 2 rows and 9 value columns across 21 spaces >> space ranges | >> <character> <iranges> | >> MACS_peak_10 ENSMUSG00000025907 1 [ 6204139, 6204741] | >> MACS_peak_102 ENSMUSG00000061518 1 [36748352, 36749344] | >> >> peak strand feature >> >> <character> <character> <character> >> MACS_peak_10 ENSMUSG00000025907 MACS_peak_10 1 >> ENSMUSG00000025907 >> MACS_peak_102 ENSMUSG00000061518 MACS_peak_102 1 >> ENSMUSG00000061518 >> >> start_position end_position insideFeature >> >> <numeric> <numeric> <character> >> MACS_peak_10 ENSMUSG00000025907 6204743 6265656 upstream >> MACS_peak_102 ENSMUSG00000061518 36748425 36750230 overlapStart >> >> distancetoFeature shortestDistance >> >> <numeric> <numeric> >> MACS_peak_10 ENSMUSG00000025907 -604 2 >> MACS_peak_102 ENSMUSG00000061518 -73 73 >> >> fromOverlappingOrNearest >> >> <character> >> MACS_peak_10 ENSMUSG00000025907 NearestStart >> MACS_peak_102 ENSMUSG00000061518 NearestStart >> >> If I do like this: >> >>> org.Mm.egENSEMBL2EG$`ENSMUSG00000025907` >>> >> [1] "12421" >> >>> org.Mm.egSYMBOL$`12421` >>> >> [1] "Rb1cc1" >> >> Which is the symbol I am itnerested in. I would like to add the symbol >> corresponding to each row in the RangedData Object as a new column. >> >> I have tried this, with mget: >> >> >> annotatedPeaks$entrez<-mget(annotatedPeaks$feature,org.Mm.egENSEMBL 2EG,ifnotfound=NA) >> >> but: >> >> >>> annotatedPeaks[1:2,] >>> >> RangedData with 2 rows and 10 value columns across 21 spaces >> space ranges | >> <character> <iranges> | >> MACS_peak_10 ENSMUSG00000025907 1 [ 6204139, 6204741] | >> MACS_peak_102 ENSMUSG00000061518 1 [36748352, 36749344] | >> peak strand >> feature >> <character> <character> >> <character> >> MACS_peak_10 ENSMUSG00000025907 MACS_peak_10 1 >> ENSMUSG00000025907 >> MACS_peak_102 ENSMUSG00000061518 MACS_peak_102 1 >> ENSMUSG00000061518 >> start_position end_position insideFeature >> <numeric> <numeric> <character> >> MACS_peak_10 ENSMUSG00000025907 6204743 6265656 upstream >> MACS_peak_102 ENSMUSG00000061518 36748425 36750230 overlapStart >> distancetoFeature shortestDistance >> <numeric> <numeric> >> MACS_peak_10 ENSMUSG00000025907 -604 2 >> MACS_peak_102 ENSMUSG00000061518 -73 73 >> fromOverlappingOrNearest entrez >> <character> <list> >> MACS_peak_10 ENSMUSG00000025907 NearestStart ######## >> MACS_peak_102 ENSMUSG00000061518 NearestStart ######## >> >> as mget returns a list. Note that some ensemble IDs map to more than one >> gene ID. >> > > Well, that's true and unavoidable. You will simply have to choose which > symbol you want to use. Or else, you can concatenate with commas if you > want them all. > > symbs <-mget(annotatedPeaks$feature,org.Mm.egENSEMBL2EG,ifnotfound=NA) > symbs <- sapply(symbs, function(x) paste(x, collapse = ", ")) > annotatedPeaks$entrez <- symbs > > And as you already noted, it is a two step process to go from Ensembl > gene IDs to gene symbols, so what has been done above isn't quite > sufficient. But I assume you get the idea. > > You can always do things in one shot using SQL queries. An example you > can work from can be found in in section 2.0.9 of the AnnotationDbi > vignette. That can be fun if you like SQL stuff. > > Alternatively, you can use biomaRt. Note that you will always want to > return the Ensembl gene ID when you use biomaRt, so you can line things > up after the fact. As an example: > > > suppressMessages(library(biomaRt)) > > mart <- useMart("ensembl","mmusculus_gene_ensembl") > ## fake up some IDs > > ids <- paste("ENSMUSG", sprintf("%011.0f", > c(1,3,28,37,49,56,58,78,85,88,93,94,103)), sep="") > > ids > [1] "ENSMUSG00000000001" "ENSMUSG00000000003" "ENSMUSG00000000028" > [4] "ENSMUSG00000000037" "ENSMUSG00000000049" "ENSMUSG00000000056" > [7] "ENSMUSG00000000058" "ENSMUSG00000000078" "ENSMUSG00000000085" > [10] "ENSMUSG00000000088" "ENSMUSG00000000093" "ENSMUSG00000000094" > [13] "ENSMUSG00000000103" > > out <- getBM(c("ensembl_gene_id", "mgi_symbol"), "ensembl_gene_id", > ids, mart) > > out > ensembl_gene_id mgi_symbol > 1 ENSMUSG00000000001 Gnai3 > 2 ENSMUSG00000000003 Pbsn > 3 ENSMUSG00000000028 Cdc45 > 4 ENSMUSG00000000037 Scml2 > 5 ENSMUSG00000000049 Apoh > 6 ENSMUSG00000000056 Narf > 7 ENSMUSG00000000058 Cav2 > 8 ENSMUSG00000000078 Klf6 > 9 ENSMUSG00000000085 Scmh1 > 10 ENSMUSG00000000088 Cox5a > 11 ENSMUSG00000000093 Tbx2 > 12 ENSMUSG00000000094 Tbx4 > 13 ENSMUSG00000000103 Zfy2 > > Here if you have any duplicated symbols, you can use the trick from > above. First convert to a list: > > outlist <- tapply(1:nrow(out), out[,1], function(x) out[x,2]) > > then > > outlist <- sapply(outlist, function(x) paste(x, collapse = ", ")) > > and append to your RangedData object. > > Also note that Ensembl gene IDs that don't map to a symbol will be > dropped silently, so you might need to do some manipulations of the > returned data.frame to insert the gene IDs (and a '' for the symbol) in > order to have things line up correctly when you put the symbols into > your RangedData object. > > Best, > > Jim > > > > > >> Also, using the convert2EntrezID from the same ChIPpeakAnno package: >> >> >> annotatedPeaks$EntrezID<-convert2EntrezID(IDs=annotatedPeaks$featur e,orgAnn="org.Mm.eg.db",ID_type="ensembl_gene_id") >> >> Error in `[[<-`(`*tmp*`, name, value = c("12421", "12859", "67387", >> "623661", : >> 9633 elements in value to replace 13721 elements >> >> Which returns a matrix (dim 9633,1) as some ensemblID map to same gene ID. >> >> As far as I understand I need to get use the geneID to map ensemblID to >> SYMBOL. So i cannot get the Symbols. >> >> So, I am stuck here. >> >> >> Thanks again, >> Marc >> >> >>> sessionInfo() >>> >> R version 2.11.0 (2010-04-22) >> x86_64-unknown-linux-gnu >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] org.Mm.eg.db_2.4.1 ChIPpeakAnno_1.4.1 >> [3] limma_3.4.0 org.Hs.eg.db_2.4.1 >> [5] GO.db_2.4.1 RSQLite_0.9-0 >> [7] DBI_0.2-5 AnnotationDbi_1.10.1 >> [9] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.1 >> [11] GenomicRanges_1.0.1 Biostrings_2.16.0 >> [13] IRanges_1.6.1 multtest_2.4.0 >> [15] Biobase_2.8.0 biomaRt_2.4.0 >> >> loaded via a namespace (and not attached): >> [1] MASS_7.3-5 RCurl_1.4-2 splines_2.11.0 survival_2.35-8 >> [5] tools_2.11.0 XML_3.1-0 >> >> >> James W. MacDonald wrote: >> >>> Hi Marc, >>> >>> On 8/10/2010 6:26 AM, Marc Noguera wrote: >>> >>> >>>> Hi all, >>>> this may be a very naive question by I have been trying to solve it >>>> myself and i can't get through it. >>>> >>>> I have this RangedData object obtained from ChIPpeakAnno Package, which >>>> correspond toa Chipseq experiment with annotated peaks, with ENSEMBL >>>> identificators. >>>> I can use this output already but like to transform the ENSEMBLID to a >>>> gene symbol id >>>> >>>> for instance: ENSMUSG00000025907 to "Rb1cc1" symbol. It also would be >>>> useful to add a field linking to a entrez gene web url. >>>> >>>> I have been looking at the org.Mm.eg.db package and although I can >>>> retrieve the symbol for a particular ENSEMBLID can't get it for all the >>>> elements in the object. >>>> >>>> >>> What have you tried so far? Unless you give an example of what you have >>> done and how it didn't perform as you expect, it is very difficult for >>> anybody to help. >>> >>> As a shot in the dark, have you looked at the help page for mget()? >>> >>> I don't really understand how the field linking to Entrez Gene would >>> work, considering a RangedData object isn't an HTML page. However, >>> building a URL to Entrez Gene isn't that difficult. You can hijack some >>> internal code from the annotate package: >>> >>> > suppressMessages(library(annotate)) >>> > egids<- 1:5 >>> > annotate:::.repositories[["en"]](egids) >>> [1] >>> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retriev e&dopt=Graphics&list_uids=1" >>> [2] >>> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retriev e&dopt=Graphics&list_uids=2" >>> [3] >>> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retriev e&dopt=Graphics&list_uids=3" >>> [4] >>> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retriev e&dopt=Graphics&list_uids=4" >>> [5] >>> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retriev e&dopt=Graphics&list_uids=5" >>> >>> But doing it by hand wouldn't be that much more difficult. If you strip >>> out the error checking from the above function, all it really consists of is >>> >>> thefunction<- function(ids){ >>> paste("http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=R etrieve&dopt=Graphics&list_uids=", >>> >>> ids, sep = "") >>> } >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> >>>> Many thanks >>>> Marc >>>> >>>> >>>> >>> >> > > -- ----------------------------------------------------- Marc Noguera i Julian, PhD Genomics unit / Bioinformatics Institut de Medicina Predictiva i Personalitzada del C?ncer (IMPPC) B-10 Office Carretera de Can Ruti Cam? de les Escoles s/n 08916 Badalona, Barcelona

ADD REPLY • link 15.4 years ago Marc Noguera ▴ 100