Search
Question: UniProt.ws non-functional or abandoned?
1
gravatar for balin
26 days ago by
balin0
balin0 wrote:

The current UnipProt.ws appears disfunctional in my hands. When running through the example of UniProt.ws::UniProt.ws connecting to the db is excruciatingly slow and the final exemplary lookup fails.

My own use case also does not work.

Is the package functioning for anybody? Is it abandoned? Are there alternatives?

 

Joh

ADD COMMENTlink modified 26 days ago by James W. MacDonald48k • written 26 days ago by balin0

sessionInfo() and explicit example please. There are definitely problems with UniProt.ws but it builds and checks nightly so the 'code' works at some level ... just looking for something more specific from your end.

ADD REPLYlink written 26 days ago by Martin Morgan ♦♦ 22k

Here's my experimentation (modified from ?UniProt.ws::UniProt.ws):

libraryUniProt.ws)
system.time(up <- UniProt.ws(taxId=9606))
user  system elapsed
0.278   0.057  47.151

... this is insanely long.

Additionally (also from the main example):

res <- select(up, 
              keys = c("22627","22629"), 
              columns = c("PDB","UNIGENE","SEQUENCE"),
              keytype = "ENTREZ_GENE")

fails after another massive chunk of time with:

Getting mapping data for 22627 ... and ACC
Error in .select(x, keys, columns, keytype) : 
No data is available for the keys provided.

 

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2.2      UniProt.ws_2.22.0   BiocGenerics_0.28.0 RCurl_1.95-4.11     bitops_1.0-6        RSQLite_2.1.1      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0           compiler_3.5.1       pillar_1.3.0         dbplyr_1.2.2         bindr_0.1.1          tools_3.5.1         
 [7] digest_0.6.18        bit_1.1-14           BiocFileCache_1.6.0  memoise_1.1.0        tibble_1.4.2         pkgconfig_2.0.2     
[13] rlang_0.3.0.1        DBI_1.0.0            rstudioapi_0.8       curl_3.2             yaml_2.2.0           dplyr_0.7.8         
[19] httr_1.3.1           S4Vectors_0.20.1     IRanges_2.16.0       rappdirs_0.3.1       stats4_3.5.1         bit64_0.9-7         
[25] tidyselect_0.2.5     glue_1.3.0           Biobase_2.42.0       R6_2.3.0             AnnotationDbi_1.44.0 purrr_0.2.5         
[31] blob_1.1.1           magrittr_1.5         assertthat_0.2.0     crayon_1.3.4

 

ADD REPLYlink modified 26 days ago • written 26 days ago by balin0

As a preliminary observation, note that the 'user' and 'system' time are much smaller than 'elapsed' time -- basically, the local system is doing nothing, waiting for the UniProt web service to reply -- the problem seems like it is on the UniProt end, specifically responding to the request for

https://www.uniprot.org/uniprot/?query=organism:9606&format=tab&columns=id

(pasting this into a browser returns some results quickly, but actually as you scroll down the page you'll see more results appear incrementally...) The problem seems to be on the UniProt end; have you contacted them?

 

ADD REPLYlink written 26 days ago by Martin Morgan ♦♦ 22k
0
gravatar for James W. MacDonald
26 days ago by
United States
James W. MacDonald48k wrote:

As Martin notes, most of the time spent is on the UniProt side. The original call to UniProt.ws is basically asking the UniProt webserver to spit back all the UniProt IDs for a given species. That does take time because you are asking for a lot of stuff, and it takes a long time for the UniProt server to spit out all the data and for it to be read back into R. There are probably faster ways to do this, but the person who was mainly responsible for the package is gone now, and it's not a high-usage package, and it does work, so there you go.

As far as the lack of return data in your additional example, that's due to user error on your part. The example for this function starts out using human (TaxId 9606) data, and then switches to mouse

  ## set the taxId to something else
     taxId(up) <- 10090
     up

And then does a query on two murine Gene IDs. The fact that you get nothing back from a query using murine Gene IDs against a human database is expected.

ADD COMMENTlink written 26 days ago by James W. MacDonald48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 366 users visited in the last hour