Search
Question: UniProt.ws non-functional or abandoned?
1
26 days ago by
balin0
balin0 wrote:

The current UnipProt.ws appears disfunctional in my hands. When running through the example of UniProt.ws::UniProt.ws connecting to the db is excruciatingly slow and the final exemplary lookup fails.

My own use case also does not work.

Is the package functioning for anybody? Is it abandoned? Are there alternatives?

Joh

modified 26 days ago by James W. MacDonald48k • written 26 days ago by balin0

sessionInfo() and explicit example please. There are definitely problems with UniProt.ws but it builds and checks nightly so the 'code' works at some level ... just looking for something more specific from your end.

Here's my experimentation (modified from ?UniProt.ws::UniProt.ws):

libraryUniProt.ws)
system.time(up <- UniProt.ws(taxId=9606))
user  system elapsed
0.278   0.057  47.151

... this is insanely long.

Additionally (also from the main example):

res <- select(up,
keys = c("22627","22629"),
columns = c("PDB","UNIGENE","SEQUENCE"),
keytype = "ENTREZ_GENE")

fails after another massive chunk of time with:

Getting mapping data for 22627 ... and ACC
Error in .select(x, keys, columns, keytype) :
No data is available for the keys provided.

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] bindrcpp_0.2.2      UniProt.ws_2.22.0   BiocGenerics_0.28.0 RCurl_1.95-4.11     bitops_1.0-6        RSQLite_2.1.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.0           compiler_3.5.1       pillar_1.3.0         dbplyr_1.2.2         bindr_0.1.1          tools_3.5.1
[7] digest_0.6.18        bit_1.1-14           BiocFileCache_1.6.0  memoise_1.1.0        tibble_1.4.2         pkgconfig_2.0.2
[13] rlang_0.3.0.1        DBI_1.0.0            rstudioapi_0.8       curl_3.2             yaml_2.2.0           dplyr_0.7.8
[19] httr_1.3.1           S4Vectors_0.20.1     IRanges_2.16.0       rappdirs_0.3.1       stats4_3.5.1         bit64_0.9-7
[25] tidyselect_0.2.5     glue_1.3.0           Biobase_2.42.0       R6_2.3.0             AnnotationDbi_1.44.0 purrr_0.2.5
[31] blob_1.1.1           magrittr_1.5         assertthat_0.2.0     crayon_1.3.4

As a preliminary observation, note that the 'user' and 'system' time are much smaller than 'elapsed' time -- basically, the local system is doing nothing, waiting for the UniProt web service to reply -- the problem seems like it is on the UniProt end, specifically responding to the request for

https://www.uniprot.org/uniprot/?query=organism:9606&format=tab&columns=id

(pasting this into a browser returns some results quickly, but actually as you scroll down the page you'll see more results appear incrementally...) The problem seems to be on the UniProt end; have you contacted them?

0
26 days ago by
United States
James W. MacDonald48k wrote:

As Martin notes, most of the time spent is on the UniProt side. The original call to UniProt.ws is basically asking the UniProt webserver to spit back all the UniProt IDs for a given species. That does take time because you are asking for a lot of stuff, and it takes a long time for the UniProt server to spit out all the data and for it to be read back into R. There are probably faster ways to do this, but the person who was mainly responsible for the package is gone now, and it's not a high-usage package, and it does work, so there you go.

As far as the lack of return data in your additional example, that's due to user error on your part. The example for this function starts out using human (TaxId 9606) data, and then switches to mouse

  ## set the taxId to something else
taxId(up) <- 10090
up

And then does a query on two murine Gene IDs. The fact that you get nothing back from a query using murine Gene IDs against a human database is expected.