GOexpress: GO_analyse function throws Error: Space required after the Public Identifier
5
0
Entering edit mode
@martinhoelzer-8847
Last seen 8.8 years ago
Germany

Hello, 

some time ago I used the Go_analyse function of the GOexpress package with Ensembl IDs as input, so the automatic mapping should be no problem. Now I get the following error:

> GO_results <- GO_analyse(eSet = minimalSet, f = "Treatment")
First feature identifier in dataset: ENSG00000000003
Looks like Ensembl gene identifier.
Loading detected dataset hsapiens_gene_ensembl ...
Space required after the Public Identifier
SystemLiteral " or ' expected
SYSTEM or PUBLIC, the URI is missing
Opening and ending tag mismatch: hr line 7 and body
Opening and ending tag mismatch: body line 4 and html
Premature end of data in tag html line 2
Error: 1: Space required after the Public Identifier
2: SystemLiteral " or ' expected
3: SYSTEM or PUBLIC, the URI is missing
4: Opening and ending tag mismatch: hr line 7 and body
5: Opening and ending tag mismatch: body line 4 and html
6: Premature end of data in tag html line 2

I think the problem is, that I have to define a specific ensembl URL, like for a Biomart object:

ensembl = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="mmusculus_gene_ensembl", host = "jul2015.archive.ensembl.org")

but I do not know how to do this for the GO_analyse function?

Thanks!

Martin

goexpress • 2.9k views
ADD COMMENT
2
Entering edit mode
kevin.rue ▴ 350
@kevinrue-6757
Last seen 7 months ago
University of Oxford

Hi Martin,

Current release version of biomaRt and GOexpress are respectively: 2.26.1 and 1.4.1

While I test whether downgrading my biomaRt package replicates your issue, I would recommend you upgrade your own packages to the latest version, and post whether this solves the problem.

Kind regards,

Kevin

 

Edit: I have just tested the version 2.24.1 of the biomaRt package, and this way I can replicate your error (see below). Please update your biomaRt package to the latest version. This should solve your problem. Please try to maintain your packages up to date where possible (package updates might occasionally disrupt your pipeline).

> library("biomaRt", lib.loc="~/R/x86_64-pc-linux-gnu-library/3.2")
> ensembl = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="mmusculus_gene_ensembl")
Space required after the Public Identifier
SystemLiteral " or ' expected
SYSTEM or PUBLIC, the URI is missing
Opening and ending tag mismatch: hr line 7 and body
Opening and ending tag mismatch: body line 4 and html
Premature end of data in tag html line 2
Error: 1: Space required after the Public Identifier
2: SystemLiteral " or ' expected
3: SYSTEM or PUBLIC, the URI is missing
4: Opening and ending tag mismatch: hr line 7 and body
5: Opening and ending tag mismatch: body line 4 and html
6: Premature end of data in tag html line 2
ADD COMMENT
0
Entering edit mode

Hi Kevin, 

this solves my issue, thank you very much. 

best,

Martin

ADD REPLY
0
Entering edit mode
kevin.rue ▴ 350
@kevinrue-6757
Last seen 7 months ago
University of Oxford

Hi Martin,

I am afraid I have not enough information to reproduce/understand the error that you point out here, as I have never encountered it before.

Given your second snippet of code, I suspect that you found the answer on the thread: makeTxDbFromBiomart Error: 1: Space required after the Public Identifier

Considering that post, I am not sure what is going wrong here. I have just successfully queried the main portal without the need to the host argument (they suggested "www.ensembl.org" and "jul2015.archive.ensembl.org")

Would you mind trying the GO_analyse function again a couple of times? Or trying a dummy query such as:

ensembl = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="mmusculus_gene_ensembl")
getBM(attributes = 'ensembl_gene_id', mart = ensembl)

If that does not work. I will be happy to investigate further the problem, to figure out whether it is a GOexpress or a biomaRt issue.

 

For the record, the GO_analyse function does not support specific Ensembl URLs for automated annotations. Automated annotations are directly fetched from the latest (i.e., current) Biomart release.  This decision was made around Ensembl release 75,  when some column identifiers in the Biomart database were changed, and broke the automated annotation feature of GO_analyse (external_gene_id became external_gene_name). GO_analyse will always follow the naming convention of the current Ensembl BioMart release (Otherwise, multiple failed calls would need to be done using different combination of column names, each valid for particular Ensembl releases, until the right combination returns the appropriate columns).

Backwards compatibility with previous releases is recommended/supported through "custom annotations" which should be downloaded separately from Biomart, and provided to the GO_analyse function through the arguments: GO_genes, all_GO, and all_genes.

Best regards

Kevin

ADD COMMENT
0
Entering edit mode
@martinhoelzer-8847
Last seen 8.8 years ago
Germany

Hi Kevin, 

thanks for your reply, I just tried several times

ensembl = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="mmusculus_gene_ensembl")

but also here I just get:

Space required after the Public Identifier
SystemLiteral " or ' expected
SYSTEM or PUBLIC, the URI is missing
Opening and ending tag mismatch: hr line 7 and body
Opening and ending tag mismatch: body line 4 and html
Premature end of data in tag html line 2
Error: 1: Space required after the Public Identifier
2: SystemLiteral " or ' expected
3: SYSTEM or PUBLIC, the URI is missing
4: Opening and ending tag mismatch: hr line 7 and body
5: Opening and ending tag mismatch: body line 4 and html
6: Premature end of data in tag html line 2

I just also figured out that the ensembl web page is down www.ensembl.org), so maybe this is just the issue... If I speciefy a specific host, it works. 

 

ADD COMMENT
0
Entering edit mode
kevin.rue ▴ 350
@kevinrue-6757
Last seen 7 months ago
University of Oxford

I am afraid this looks more like a problem on your side... For me the following works perfectly (?!):

ensembl = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="mmusculus_gene_ensembl")

No idea why the difference between us... Do you have the latest version of R and the relevant packages? You haven't shown your sessionInfo() output.

Thanks for your question, and please do not forget to click the "Accept" button if the reply satisfies you.

 

PS: would you be kind to go to the page GOexpress: use DESeq2 data object as input and accept my answer with 2 votes? You created the thread, and seemed happy about the answer. You're the only user allowed conclude that thread by accepting an answer. Many thanks!

ADD COMMENT
0
Entering edit mode
@martinhoelzer-8847
Last seen 8.8 years ago
Germany

Hi Kevin, 

the ensembl webpage is now accessible again via www.ensembl.org, but this was not the problem. The command 

ensembl = useMart(biomart = "ENSEMBL_MART_ENSEMBL",dataset="mmusculus_gene_ensembl")

still throws this error. My sessioninfo() tells R version 3.2.2 and biomaRt_2.24.1/GOexpress_1.2.2:

> sessionInfo() 
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] pathview_1.8.0            KEGGgraph_1.26.0         
 [3] gage_2.18.0               GOexpress_1.2.2          
 [5] VennDiagram_1.6.16        futile.logger_1.4.1      
 [7] biomaRt_2.24.1            genefilter_1.50.0        
 [9] stringr_1.0.0             geneplotter_1.46.0       
[11] annotate_1.46.1           XML_3.98-1.3             
[13] lattice_0.20-33           plyr_1.8.3               
[15] LSD_3.0                   BiocInstaller_1.18.5     
[17] vsn_3.36.0                org.Hs.eg.db_3.1.2       
[19] xlsx_0.5.7                xlsxjars_0.6.1           
[21] rJava_0.9-7               ggplot2_1.0.1            
[23] GOstats_2.34.0            graph_1.46.0             
[25] Category_2.34.2           GO.db_3.1.2              
[27] AnnotationDbi_1.30.1      Matrix_1.2-2             
[29] Biobase_2.28.0            ReportingTools_2.8.0     
[31] RSQLite_1.0.0             DBI_0.3.1                
[33] knitr_1.11                gplots_2.17.0            
[35] RColorBrewer_1.1-2        DESeq2_1.8.1             
[37] RcppArmadillo_0.5.600.2.0 Rcpp_0.12.1              
[39] GenomicRanges_1.20.6      GenomeInfoDb_1.4.3       
[41] IRanges_2.2.7             S4Vectors_0.6.5          
[43] BiocGenerics_0.14.0      

loaded via a namespace (and not attached):
 [1] colorspace_1.2-6          hwriter_1.3.2            
 [3] biovizBase_1.16.0         XVector_0.8.0            
 [5] dichromat_2.0-0           affyio_1.36.0            
 [7] splines_3.2.2             R.methodsS3_1.7.0        
 [9] ggbio_1.16.1              Formula_1.2-1            
[11] Rsamtools_1.20.4          cluster_2.0.3            
[13] png_0.1-7                 R.oo_1.19.0              
[15] httr_1.0.0                limma_3.24.15            
[17] acepack_1.3-3.3           tools_3.2.2              
[19] gtable_0.1.2              affy_1.46.1              
[21] reshape2_1.4.1            Biostrings_2.36.4        
[23] gdata_2.17.0              preprocessCore_1.30.0    
[25] rtracklayer_1.28.10       proto_0.3-10             
[27] gtools_3.5.0              edgeR_3.10.2             
[29] zlibbioc_1.14.0           MASS_7.3-44              
[31] scales_0.3.0              BSgenome_1.36.3          
[33] VariantAnnotation_1.14.13 RBGL_1.44.0              
[35] lambda.r_1.1.7            curl_0.9.3               
[37] gridExtra_2.0.0           rpart_4.1-10             
[39] reshape_0.8.5             latticeExtra_0.6-26      
[41] stringi_0.5-5             randomForest_4.6-10      
[43] GenomicFeatures_1.20.5    caTools_1.17.1           
[45] BiocParallel_1.2.21       bitops_1.0-6             
[47] GenomicAlignments_1.4.1   labeling_0.3             
[49] GSEABase_1.30.2           AnnotationForge_1.10.1   
[51] GGally_0.5.0              magrittr_1.5             
[53] R6_2.1.1                  Hmisc_3.17-0             
[55] foreign_0.8-66            survival_2.38-3          
[57] KEGGREST_1.8.0            RCurl_1.95-4.7           
[59] nnet_7.3-11               futile.options_1.0.0     
[61] KernSmooth_2.23-15        OrganismDbi_1.10.0       
[63] PFAM.db_3.1.2             locfit_1.5-9.1           
[65] Rgraphviz_2.12.0          digest_0.6.8             
[67] xtable_1.7-4              R.utils_2.1.0            
[69] munsell_0.4.2
ADD COMMENT

Login before adding your answer.

Traffic: 593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6