Hi,
I am writing because I am using UniProt.ws package for proteome retrieval. I'm trying to do this by giving as input the taxon ID for a certain organism. So I get the IDs with keys() and then I use select() to retrieve the information I need from each ID. The problem is if I try downloading the human proteome, the select() part is really time consuming. As an example:
library(UniProt.ws)
up <- UniProt.ws(9606)
egs = keys(up, "UNIPROTKB")
length(egs)
[1] 192814
# Almost 200k UniprotKB IDs mapping to Homo sapiens
# The message I get when I run the select() line is:
res <- select(up, keys = egs, columns = c("ORGANISM","UNIPROTKB", "REVIEWED", "LENGTH" ,"SEQUENCE"), keytype = "UNIPROTKB")
Uniprot limits queries with a large amount of keys. It's recommended that the select method be invoked with fewer than 100 keys or the query may fail.
Getting extra data for Q8N7X0, Q5T1N1, Q92667... (400 total)
Getting extra data for Q14094, Q8TBY9, Q8WUH1... (400 total)
Getting extra data for B4DZS4, Q9Y4R8, A0A087X1G2... (400 total)
Getting extra data for Q4KMQ1, Q12815, O94811... (400 total)
Getting extra data for A4D0V7, Q14894, Q13324... (400 total)
Getting extra data for O14618, Q8IZV2, O76039... (400 total)
Getting extra data for P29966, Q8IVH8, A1Z1Q3... (400 total)
Getting extra data for A0A0C4DH26, Q9BQK8, A0A0B4J2D9... (400 total)
Timing stopped at: 1.107 0.109 74.67
# I stopped it after the first minute
Taking into account that the number of UniprotKB IDs associated to Homo sapiens is 192k aprox and it takes roughly 1 minute to download ~3k IDs, it takes more than 1 hour to download all the entries. I was wondering if there would be a way to speed-up this process? Because for species with thousands of entries, it takes a while to retrieve them all. I'm asking this because I intend to use this often for many organisms.
Thanks a lot,
Javier