I am using Uniprot.ws, but I can't get the full information I would like to have.
I would like to retrieve a short discription about the protein, which is called "Function [CC]" in the resulting table when I use UniProt's ID mapping.
Are there other possibilites than using ID mapping from Uniprot direclty?
The UniProt.ws package only provides a subset of the data you can get directly from the UniProt website. You can see what's available by loading the package and then calling the columns function:
The link provided in the comment by l.nilse will provide you with the name you should be looking for, which in this case is FUNCTION. I don't believe any of the data under the Function header are available through UniProt.ws.
I'll take a look at adding that. Right now you can return the comments column, but it just tells you what type of comments there are, without returning the individual columns themselves:
> ws <- UniProt.ws()
select(ws, "P23434", c("COMMENTS"), "UNIPROTKB")
>
Getting extra data for P23434
'select()' returned 1:1 mapping between keys and columns
UNIPROTKB
1 P23434
COMMENTS
1 Cofactor (1); Function (1); Involvement in disease (1); Sequence similarities (1); Subcellular location (1); Subunit structure (1)
>
OK, fixed now. This is in the devel version of Bioconductor, as the release version is frozen. The updates will propagate through the build system in the next couple of days.
> select(ws, "P23434", "FUNCTION","UNIPROTKB")
Getting extra data for P23434
'select()' returned 1:1 mapping between keys and columns
UNIPROTKB
1 P23434
FUNCTION
1 FUNCTION: The glycine cleavage system catalyzes the degradation of glycine. The H protein (GCSH) shuttles the methylamine group of glycine from the P protein (GLDC) to the T protein (GCST). {ECO:0000269|PubMed:1671321}.
> sessionInfo()
R Under development (unstable) (2018-02-01 r74194)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default
BLAS: /data/oldR/R-devel/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-devel/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] UniProt.ws_2.19.2 BiocGenerics_0.25.3 RCurl_1.95-4.10
[4] bitops_1.0-6 RSQLite_2.1.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 AnnotationDbi_1.41.4 magrittr_1.5
[4] bindr_0.1.1 rappdirs_0.3.1 IRanges_2.13.28
[7] bit_1.1-12 R6_2.2.2 rlang_0.2.0
[10] httr_1.3.1 blob_1.1.1 dplyr_0.7.4
[13] tools_3.5.0 Biobase_2.39.2 DBI_0.8
[16] dbplyr_1.2.1 bit64_0.9-7 digest_0.6.15
[19] assertthat_0.2.0 tibble_1.4.2 bindrcpp_0.2.2
[22] S4Vectors_0.17.41 glue_1.2.0 memoise_1.1.0
[25] BiocFileCache_1.3.42 pillar_1.2.1 compiler_3.5.0
[28] stats4_3.5.0 pkgconfig_2.0.1
>
See column Function [CC] here.
https://www.uniprot.org/help/uniprotkb_column_names