Uniprot.ws getting Function [CC]
3
0
Entering edit mode
schmid10 • 0
@schmid10-15569
Last seen 5.9 years ago

Hi,

I am using Uniprot.ws, but I can't get the full information I would like to have.

I would like to retrieve a short discription about the protein, which is called "Function [CC]" in the resulting table when I use UniProt's ID mapping.

Are there other possibilites than using ID mapping from Uniprot direclty?

Thanks a lot.

uniprot.ws columns • 1.6k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

The UniProt.ws package only provides a subset of the data you can get directly from the UniProt website. You can see what's available by loading the package and then calling the columns function:

> ws <- UniProt.ws()
> columns(ws)
  [1] "3D"                         "AARHUS/GHENT-2DPAGE"       
  [3] "AGD"                        "ALLERGOME"                 
  [5] "ARACHNOSERVER"              "BIOCYC"                    
  [7] "CGD"                        "CITATION"                  
  [9] "CLEANEX"                    "CLUSTERS"                  
 [11] "COMMENTS"                   "CONOSERVER"                
 [13] "CYGD"                       "DATABASE(PDB)"             
 [15] "DATABASE(PFAM)"             "DICTYBASE"                 
 [17] "DIP"                        "DISPROT"                   
 [19] "DMDM"                       "DNASU"                     
 [21] "DOMAIN"                     "DOMAINS"                   
 [23] "DRUGBANK"                   "EC"                        
 [25] "ECHOBASE"                   "ECO2DBASE"                 
 [27] "ECOGENE"                    "EGGNOG"                    
 [29] "EMBL/GENBANK/DDBJ"          "EMBL/GENBANK/DDBJ_CDS"     
 [31] "ENSEMBL"                    "ENSEMBL_GENOMES"           
 [33] "ENSEMBL_GENOMES PROTEIN"    "ENSEMBL_GENOMES TRANSCRIPT"
 [35] "ENSEMBL_PROTEIN"            "ENSEMBL_TRANSCRIPT"        
 [37] "ENTREZ_GENE"                "ENTRY-NAME"                
 [39] "EUHCVDB"                    "EUPATHDB"                  
 [41] "EXISTENCE"                  "FAMILIES"                  
 [43] "FEATURES"                   "FLYBASE"                   
 [45] "GENECARDS"                  "GENEFARM"                  
 [47] "GENES"                      "GENETREE"                  
 [49] "GENOLIST"                   "GENOMERNAI"                
 [51] "GERMONLINE"                 "GI_NUMBER*"                
 [53] "GO"                         "GO-ID"                     
 [55] "HGNC"                       "H-INVDB"                   
 [57] "HOGENOM"                    "HPA"                       
 [59] "HSSP"                       "ID"                        
 [61] "INTERACTOR"                 "INTERPRO"                  
 [63] "KEGG"                       "KEYWORD-ID"                
 [65] "KEYWORDS"                   "KO"                        
 [67] "LAST-MODIFIED"              "LEGIOLIST"                 
 [69] "LENGTH"                     "LEPROMA"                   
 [71] "MAIZEGDB"                   "MEROPS"                    
 [73] "MGI"                        "MIM"                       
 [75] "MINT"                       "NEXTBIO"                   
 [77] "NEXTPROT"                   "OMA"                       
 [79] "ORGANISM"                   "ORGANISM-ID"               
 [81] "ORPHANET"                   "ORTHODB"                   
 [83] "PATHWAY"                    "PATRIC"                    
 [85] "PDB"                        "PEROXIBASE"                
 [87] "PHARMGKB"                   "PHOSSITE"                  
 [89] "PIR"                        "POMBASE"                   
 [91] "PPTASEDB"                   "PROTCLUSTDB"               
 [93] "PROTEIN-NAMES"              "PSEUDOCAP"                 
 [95] "REACTOME"                   "REBASE"                    
 <snip>

The link provided in the comment by l.nilse will provide you with the name you should be looking for, which in this case is FUNCTION. I don't believe any of the data under the Function header are available through UniProt.ws.

ADD COMMENT
0
Entering edit mode

Thanks, James. - I filed a features request. https://github.com/Bioconductor/UniProt.ws/issues/1

ADD REPLY
0
Entering edit mode

I'll take a look at adding that. Right now you can return the comments column, but it just tells you what type of comments there are, without returning the individual columns themselves:

> ws <- UniProt.ws()
select(ws, "P23434", c("COMMENTS"), "UNIPROTKB")
> 
Getting extra data for P23434
'select()' returned 1:1 mapping between keys and columns
  UNIPROTKB
1    P23434
                                                                                                                            COMMENTS
1 Cofactor (1); Function (1); Involvement in disease (1); Sequence similarities (1); Subcellular location (1); Subunit structure (1)
> 
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

OK, fixed now. This is in the devel version of Bioconductor, as the release version is frozen. The updates will propagate through the build system in the next couple of days.

> select(ws, "P23434", "FUNCTION","UNIPROTKB")
Getting extra data for P23434
'select()' returned 1:1 mapping between keys and columns
  UNIPROTKB
1    P23434
                                                                                                                                                                                                                     FUNCTION
1 FUNCTION: The glycine cleavage system catalyzes the degradation of glycine. The H protein (GCSH) shuttles the methylamine group of glycine from the P protein (GLDC) to the T protein (GCST). {ECO:0000269|PubMed:1671321}.
> sessionInfo()
R Under development (unstable) (2018-02-01 r74194)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /data/oldR/R-devel/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-devel/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] UniProt.ws_2.19.2   BiocGenerics_0.25.3 RCurl_1.95-4.10    
[4] bitops_1.0-6        RSQLite_2.1.0      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16         AnnotationDbi_1.41.4 magrittr_1.5        
 [4] bindr_0.1.1          rappdirs_0.3.1       IRanges_2.13.28     
 [7] bit_1.1-12           R6_2.2.2             rlang_0.2.0         
[10] httr_1.3.1           blob_1.1.1           dplyr_0.7.4         
[13] tools_3.5.0          Biobase_2.39.2       DBI_0.8             
[16] dbplyr_1.2.1         bit64_0.9-7          digest_0.6.15       
[19] assertthat_0.2.0     tibble_1.4.2         bindrcpp_0.2.2      
[22] S4Vectors_0.17.41    glue_1.2.0           memoise_1.1.0       
[25] BiocFileCache_1.3.42 pillar_1.2.1         compiler_3.5.0      
[28] stats4_3.5.0         pkgconfig_2.0.1     
> 
ADD COMMENT
0
Entering edit mode

Great. Thanks for the fix, James. We will test and then close the GitHub issue.

ADD REPLY
0
Entering edit mode
schmid10 • 0
@schmid10-15569
Last seen 5.9 years ago

Hi everyone,

thank you very much for all your help and adding the "Function" column to the package.

Regards

ADD COMMENT

Login before adding your answer.

Traffic: 627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6