Question

Unimart not available in 2015?

0

Entering edit mode

john ▴ 10

@john-9266

Last seen 10.1 years ago

United States

I've used BiomaRt to map Ensemble to Entrez Id's and Uniprot Accession Id's . Recently, biomart made some changes and it seems that Unimart is no longer available? If i try this code (that was working before), i get:

mart = useMart(biomart = 'unimart',dataset='uniprot',verbose = T)

Space required after the Public Identifier
SystemLiteral " or ' expected
SYSTEM or PUBLIC, the URI is missing
Opening and ending tag mismatch: hr line 7 and body
Opening and ending tag mismatch: body line 4 and html
Premature end of data in tag html line 2
Error: 1: Space required after the Public Identifier
2: SystemLiteral " or ' expected
3: SYSTEM or PUBLIC, the URI is missing
4: Opening and ending tag mismatch: hr line 7 and body
5: Opening and ending tag mismatch: body line 4 and html
6: Premature end of data in tag html line 2

I tried changing the host option as I did with the Ensemble :

listMarts(host="www.ensembl.org")   <-- works

listMarts(host="www.uniprot.org")  <--- does NOT work

I read somewhere in the past that unimart support was later updated to work again and I wonder if that is the same now? Is there a fix for this? Am I missing something?

EDIT:

One thing I forgot to mention is that I could also get the protein and gene names and symbols from Unimart, which is something I'm also looking for.

Thanks,

j

biomart uniprot unimart • 4.9k views

ADD COMMENT • link 10.2 years ago john ▴ 10

0

Entering edit mode

Dear John,

Unimart is now hosted on the EBI website, so I would expect the following to work:

listMarts(host="www.ebi.ac.uk/uniprot")

Sadly it doesn't as it seems that there is something wrong with the registry, I will email them.

Please note that the Uniprot team are retiring the Uniprot mart:

We at UniProt are always committed to improving our level of service and openly communicating changes with our users. 

Based on recent user surveys and service evaluations, we have decided that our UniProt Biomart service will be retired later this year. The October 2015 data release will be the final update for the Uniprot Biomart however, the service will remain available until December 2015.

For those of you who rely on the UniProt Biomart for tasks such as: ID mapping, bulk retrieval of entries, or programmatic access to entry annotations; we have alternative services that we hope satisfy your needs. Please visit our YouTube channels and help pages for tutorials and more information about these services.

UniProt ID Mapping Service

YouTube ID Mapping Tutorial

UniProt Programmatic Access Help Pages   

Regards,

Uniprot Team

ADD REPLY • link 10.2 years ago Thomas Maurel ▴ 800

0

Entering edit mode

Dear John,

I've been a bit hasty, the following works:

> listMarts(host="www.ebi.ac.uk", path="/uniprot/biomart/martservice")
               biomart                  version
1              unimart         UNIPROT (EBI UK)
2 ENSEMBL_MART_ENSEMBL ENSEMBL GENES 80(EBI UK)
3                pride           PRIDE (EBI UK)

Hope this helps,

Thomas

ADD REPLY • link 10.2 years ago Thomas Maurel ▴ 800

1

Entering edit mode

Thanks Thomas, this is perfect! How did you know where to find the "host" site?

ADD REPLY • link 10.1 years ago john ▴ 10

0

Entering edit mode

Dear John,

You can find a list of the marts and their hosts on the following page: http://www.biomart.org/notice.html

If you want more information regarding a mart, you can replace "martview" in the following URL "http://www.ebi.ac.uk/uniprot/biomart/martview" by "martservice?type=registry" which will give you the following URL "http://www.ebi.ac.uk/uniprot/biomart/martservice?type=registry". This will help you find the path and mart name.

Regards,

Thomas

ADD REPLY • link 10.1 years ago Thomas Maurel ▴ 800

score 1 · Answer 1 · 2015-11-24

I don't know about the UniProt mart, but you could alternatively use the UniProt.ws package.

> library(UniProt.ws)
> up <- UniProt.ws(taxId=9606)
> keytypes(up)
 [1] "AARHUS/GHENT-2DPAGE"        "AGD"                       
 [3] "ALLERGOME"                  "ARACHNOSERVER"             
 [5] "BIOCYC"                     "CGD"                       
 [7] "CLEANEX"                    "CONOSERVER"                
 [9] "CYGD"                       "DICTYBASE"                 
[11] "DIP"                        "DISPROT"                   
[13] "DMDM"                       "DNASU"                     
[15] "DRUGBANK"                   "ECHOBASE"                  
[17] "ECO2DBASE"                  "ECOGENE"                   
[19] "EGGNOG"                     "EMBL/GENBANK/DDBJ"         
[21] "EMBL/GENBANK/DDBJ_CDS"      "ENSEMBL"                   
[23] "ENSEMBL_GENOMES"            "ENSEMBL_GENOMES PROTEIN"   
[25] "ENSEMBL_GENOMES TRANSCRIPT" "ENSEMBL_PROTEIN"           
[27] "ENSEMBL_TRANSCRIPT"         "ENTREZ_GENE"               
[29] "EUHCVDB"                    "EUPATHDB"                  
[31] "FLYBASE"                    "GENECARDS"                 
[33] "GENEFARM"                   "GENETREE"                  
[35] "GENOLIST"                   "GENOMERNAI"                
[37] "GERMONLINE"                 "GI_NUMBER*"                
[39] "HGNC"                       "H-INVDB"                   
[41] "HOGENOM"                    "HOVERGEN"                  
[43] "HPA"                        "HSSP"                      
[45] "KEGG"                       "KO"                        
[47] "LEGIOLIST"                  "LEPROMA"                   
[49] "MAIZEGDB"                   "MEROPS"                    
[51] "MGI"                        "MIM"                       
[53] "MINT"                       "NEXTBIO"                   
[55] "NEXTPROT"                   "OMA"                       
[57] "ORPHANET"                   "ORTHODB"                   
[59] "PATRIC"                     "PDB"                       
[61] "PEROXIBASE"                 "PHARMGKB"                  
[63] "PHOSSITE"                   "PIR"                       
[65] "POMBASE"                    "PPTASEDB"                  
[67] "PROTCLUSTDB"                "PSEUDOCAP"                 
[69] "REACTOME"                   "REBASE"                    
[71] "REFSEQ_NUCLEOTIDE"          "REFSEQ_PROTEIN"            
[73] "RGD"                        "SGD"                       
[75] "TAIR"                       "TCDB"                      
[77] "TIGR"                       "TUBERCULIST"               
[79] "UCSC"                       "UNIGENE"                   
[81] "UNIPARC"                    "UNIPATHWAY"                
[83] "UNIPROTKB"                  "UNIREF100"                 
[85] "UNIREF50"                   "UNIREF90"                  
[87] "VECTORBASE"                 "WORLD-2DPAGE"              
[89] "WORMBASE"                   "WORMBASE_PROTEIN"          
[91] "WORMBASE_TRANSCRIPT"        "XENBASE"                   
[93] "ZFIN"                      
> columns(up)
  [1] "3D"                         "AARHUS/GHENT-2DPAGE"       
  [3] "AGD"                        "ALLERGOME"                 
  [5] "ARACHNOSERVER"              "BIOCYC"                    
  [7] "CGD"                        "CITATION"                  
  [9] "CLEANEX"                    "CLUSTERS"                  
 [11] "COMMENTS"                   "CONOSERVER"                
 [13] "CYGD"                       "DATABASE(PDB)"             
 [15] "DATABASE(PFAM)"             "DICTYBASE"                 
 [17] "DIP"                        "DISPROT"                   
 [19] "DMDM"                       "DNASU"                     
 [21] "DOMAIN"                     "DOMAINS"                   
 [23] "DRUGBANK"                   "EC"                        
 [25] "ECHOBASE"                   "ECO2DBASE"                 
 [27] "ECOGENE"                    "EGGNOG"                    
 [29] "EMBL/GENBANK/DDBJ"          "EMBL/GENBANK/DDBJ_CDS"     
 [31] "ENSEMBL"                    "ENSEMBL_GENOMES"           
 [33] "ENSEMBL_GENOMES PROTEIN"    "ENSEMBL_GENOMES TRANSCRIPT"
 [35] "ENSEMBL_PROTEIN"            "ENSEMBL_TRANSCRIPT"        
 [37] "ENTREZ_GENE"                "ENTRY-NAME"                
 [39] "EUHCVDB"                    "EUPATHDB"                  
 [41] "EXISTENCE"                  "FAMILIES"                  
 [43] "FEATURES"                   "FLYBASE"                   
 [45] "GENECARDS"                  "GENEFARM"                  
 [47] "GENES"                      "GENETREE"                  
 [49] "GENOLIST"                   "GENOMERNAI"                
 [51] "GERMONLINE"                 "GI_NUMBER*"                
 [53] "GO"                         "GO-ID"                     
 [55] "HGNC"                       "H-INVDB"                   
 [57] "HOGENOM"                    "HOVERGEN"                  
 [59] "HPA"                        "HSSP"                      
 [61] "ID"                         "INTERACTOR"                
 [63] "INTERPRO"                   "KEGG"                      
 [65] "KEYWORD-ID"                 "KEYWORDS"                  
 [67] "KO"                         "LAST-MODIFIED"             
 [69] "LEGIOLIST"                  "LENGTH"                    
 [71] "LEPROMA"                    "MAIZEGDB"                  
 [73] "MEROPS"                     "MGI"                       
 [75] "MIM"                        "MINT"                      
 [77] "NEXTBIO"                    "NEXTPROT"                  
 [79] "OMA"                        "ORGANISM"                  
 [81] "ORGANISM-ID"                "ORPHANET"                  
 [83] "ORTHODB"                    "PATHWAY"                   
 [85] "PATRIC"                     "PDB"                       
 [87] "PEROXIBASE"                 "PHARMGKB"                  
 [89] "PHOSSITE"                   "PIR"                       
 [91] "POMBASE"                    "PPTASEDB"                  
 [93] "PROTCLUSTDB"                "PROTEIN-NAMES"             
 [95] "PSEUDOCAP"                  "REACTOME"                  
 [97] "REBASE"                     "REFSEQ_NUCLEOTIDE"         
 [99] "REFSEQ_PROTEIN"             "REVIEWED"                  
[101] "RGD"                        "SCORE"                     
[103] "SEQUENCE"                   "SGD"                       
[105] "SUBCELLULAR-LOCATIONS"      "TAIR"                      
[107] "TAXON"                      "TCDB"                      
[109] "TIGR"                       "TOOLS"                     
[111] "TUBERCULIST"                "UCSC"                      
[113] "UNIGENE"                    "UNIPARC"                   
[115] "UNIPATHWAY"                 "UNIPROTKB"                 
[117] "UNIREF100"                  "UNIREF50"                  
[119] "UNIREF90"                   "VECTORBASE"                
[121] "VERSION"                    "VIRUS-HOSTS"               
[123] "WORLD-2DPAGE"               "WORMBASE"                  
[125] "WORMBASE_PROTEIN"           "WORMBASE_TRANSCRIPT"       
[127] "XENBASE"                    "ZFIN"  
                    
> select(up, c("1","2","5"), "UNIPROTKB","ENTREZ_GENE")
Getting mapping data for 1 ... and ACC
'select()' returned 1:many mapping between keys and columns
  ENTREZ_GENE UNIPROTKB
1           1    P04217
2           1    V9HWD8
3           2    P01023
4           5      <NA>

> select(up, c("ENSG00000139618"), "UNIPROTKB","ENSEMBL")
Getting mapping data for ENSG00000139618 ... and ACC
'select()' returned 1:many mapping between keys and columns
          ENSEMBL UNIPROTKB
1 ENSG00000139618    H0YD86
2 ENSG00000139618    H0YE37
3 ENSG00000139618    P51587