UniProt.ws non-functional or abandoned?
1
1
Entering edit mode
balin • 0
@balin-15829
Last seen 6.0 years ago

The current UnipProt.ws appears disfunctional in my hands. When running through the example of UniProt.ws::UniProt.ws connecting to the db is excruciatingly slow and the final exemplary lookup fails.

My own use case also does not work.

Is the package functioning for anybody? Is it abandoned? Are there alternatives?

 

Joh

uniprot.ws uniprot uniprot accessions • 1.4k views
ADD COMMENT
0
Entering edit mode

sessionInfo() and explicit example please. There are definitely problems with UniProt.ws but it builds and checks nightly so the 'code' works at some level ... just looking for something more specific from your end.

ADD REPLY
0
Entering edit mode

Here's my experimentation (modified from ?UniProt.ws::UniProt.ws):

libraryUniProt.ws)
system.time(up <- UniProt.ws(taxId=9606))
user  system elapsed
0.278   0.057  47.151

... this is insanely long.

Additionally (also from the main example):

res <- select(up, 
              keys = c("22627","22629"), 
              columns = c("PDB","UNIGENE","SEQUENCE"),
              keytype = "ENTREZ_GENE")

fails after another massive chunk of time with:

Getting mapping data for 22627 ... and ACC
Error in .select(x, keys, columns, keytype) : 
No data is available for the keys provided.

 

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2.2      UniProt.ws_2.22.0   BiocGenerics_0.28.0 RCurl_1.95-4.11     bitops_1.0-6        RSQLite_2.1.1      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0           compiler_3.5.1       pillar_1.3.0         dbplyr_1.2.2         bindr_0.1.1          tools_3.5.1         
 [7] digest_0.6.18        bit_1.1-14           BiocFileCache_1.6.0  memoise_1.1.0        tibble_1.4.2         pkgconfig_2.0.2     
[13] rlang_0.3.0.1        DBI_1.0.0            rstudioapi_0.8       curl_3.2             yaml_2.2.0           dplyr_0.7.8         
[19] httr_1.3.1           S4Vectors_0.20.1     IRanges_2.16.0       rappdirs_0.3.1       stats4_3.5.1         bit64_0.9-7         
[25] tidyselect_0.2.5     glue_1.3.0           Biobase_2.42.0       R6_2.3.0             AnnotationDbi_1.44.0 purrr_0.2.5         
[31] blob_1.1.1           magrittr_1.5         assertthat_0.2.0     crayon_1.3.4

 

ADD REPLY
0
Entering edit mode

As a preliminary observation, note that the 'user' and 'system' time are much smaller than 'elapsed' time -- basically, the local system is doing nothing, waiting for the UniProt web service to reply -- the problem seems like it is on the UniProt end, specifically responding to the request for

https://www.uniprot.org/uniprot/?query=organism:9606&format=tab&columns=id

(pasting this into a browser returns some results quickly, but actually as you scroll down the page you'll see more results appear incrementally...) The problem seems to be on the UniProt end; have you contacted them?

 

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

As Martin notes, most of the time spent is on the UniProt side. The original call to UniProt.ws is basically asking the UniProt webserver to spit back all the UniProt IDs for a given species. That does take time because you are asking for a lot of stuff, and it takes a long time for the UniProt server to spit out all the data and for it to be read back into R. There are probably faster ways to do this, but the person who was mainly responsible for the package is gone now, and it's not a high-usage package, and it does work, so there you go.

As far as the lack of return data in your additional example, that's due to user error on your part. The example for this function starts out using human (TaxId 9606) data, and then switches to mouse

  ## set the taxId to something else
     taxId(up) <- 10090
     up

And then does a query on two murine Gene IDs. The fact that you get nothing back from a query using murine Gene IDs against a human database is expected.

ADD COMMENT

Login before adding your answer.

Traffic: 761 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6