Question: UniProt.ws version 2.10.2
0
gravatar for DHS
3.8 years ago by
DHS0
USA/Stanford
DHS0 wrote:

Dear BioConductor Community, I am using the UniProt.ws version 2.10.2 to retrieve features for my protein hitlists. I am particularly interested in the SUBCELLULAR LOCATION information.

While the info is available on uniprot.org (e.g.: http://www.uniprot.org/uniprot/P35579#subcellular_location) I seem to be unable to retrieve any SUBCELLULAR LOCATION information via the UniProt.ws package, I only get NA as results while other cols return the correct infos:

res <- select(up, "P35579", "REACTOME", "UNIPROTKB")
Getting mapping data for P35579 ... and REACTOME_ID
'select()' returned 1:many mapping between keys and columns
> res
  UNIPROTKB      REACTOME
1    P35579 R-HSA-5627117
2    P35579 R-HSA-5625900
3    P35579 R-HSA-5625740
4    P35579 R-HSA-5627123
5    P35579  R-HSA-416572
6    P35579 R-HSA-3928663
7    P35579 R-HSA-2029482

but

res <- select(up, "P35579", "SUBCELLULAR-LOCATIONS", "UNIPROTKB")
Getting extra data for P35579 NA NA etc
'select()' returned 1:1 mapping between keys and columns
> res
  UNIPROTKB SUBCELLULAR-LOCATIONS
1    P35579                  <NA>

So I wonder whether the SUBCELLULAR LOCATION info is actually updated in the package or whether this supporting data can be accessed in any other way?

Best,
D

 

 

 

 

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by DHS0
Answer: UniProt.ws version 2.10.2
1
gravatar for DHS
3.8 years ago by
DHS0
USA/Stanford
DHS0 wrote:

Dear James,

thanks so much for the info and the code, it works beautifully. Since I am dealing with around 5000 uniprot id's per dataset, is there a way to use that hack and make the solution permanent?

 

ADD COMMENTlink written 3.8 years ago by DHS0

The best way to do that is to get the sources for UniProt.ws and make the changes, then install. That's usually a bit more than most are willing or able to do. The alternative is to wait for the bug to be fixed, and then get the updated package. That should be happening this week, so the easiest thing for you to do would be to wait for the fix to appear.

ADD REPLYlink written 3.8 years ago by James W. MacDonald52k

This has been fixed in devel (2.11.4) and release (2.10.3). Both should be available via biocLite() tomorrow by noon PST. 

Let us know if you run into other problems.

Valerie

ADD REPLYlink written 3.8 years ago by Valerie Obenchain6.7k
Answer: UniProt.ws version 2.10.2
1
gravatar for James W. MacDonald
3.8 years ago by
United States
James W. MacDonald52k wrote:

That's a bug in UniProt.ws. By default it is using this URI:

http://www.uniprot.org/uniprot/?query=P35579&format=tab&columns=id,subcellular locations

when in fact it is supposed to be using this one:

http://www.uniprot.org/uniprot/?query=P35579&format=tab&columns=id,comment(SUBCELLULAR LOCATION)

There are actually several different columns that UniProt.ws won't get, due to malformed URIs. Hopefully this can be resolved by next release.

The list of column names for UniProt can be found here. You could sort of hack your way through to get the correct results, if you are willing to do some work.

> debug(UniProt.ws:::.select)

> select(up, "P35579", "SUBCELLULAR-LOCATIONS", "UNIPROTKB")

Browse[2]> debug(.getSomeUniprotGoodies)

Then hit Enter until you see this:

debug: url <- "http://www.uniprot.org/uniprot/?query="
Browse[3]>
debug: fullUrl <- paste0(url, qstring, "&format=tab&columns=id,", cstring)
Browse[3]>
debug: dat <- .tryReadResult(fullUrl)

And you can see the fullUrl:

Browse[3]> fullUrl
[1] "http://www.uniprot.org/uniprot/?query=P35579&format=tab&columns=id,subcellular locations"

Then fix the URI

Browse[3]> fullUrl <- sub("subcellular locations","comment(SUBCELLULAR LOCATION)", fullUrl)
Browse[3]> fullUrl
[1] "http://www.uniprot.org/uniprot/?query=P35579&format=tab&columns=id,comment(SUBCELLULAR LOCATION)"

Then hit Enter until you get here (like twice, I think).

debug: colnames(dat) <- sub("\\.\\d", "", colnames(dat))
Browse[3]>
debug: dat <- dat[dat[, 1] %in% query, , drop = FALSE]
Browse[3]>  colnames(dat)
[1] "Entry"                     "Subcellular.location..CC."

You have to now fix the column names because the extra ..CC. will mess things up.

Browse[3]> colnames(dat) <- sub("\\.\\.CC\\.", "", colnames(dat))
Browse[3]> colnames(dat)
[1] "Entry"                "Subcellular.location"

Then hit c and Enter twice to get the debugger to just finish out.

Browse[3]> c
exiting from: FUN(qs[[i]], ...)
debug: colnames(dat)[1] <- "ACC+ID"
Browse[2]> c
'select()' returned 1:1 mapping between keys and columns
exiting from: .select(x, keys, columns, keytype)
  UNIPROTKB
1    P35579
                                                                                                                                                                                                                        SUBCELLULAR-LOCATIONS
1 SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton {ECO:0000250}. Cytoplasm, cell cortex {ECO:0000250}. Note=Colocalizes with actin filaments at lamellipodia margins and at the leading edge of migrating cells. {ECO:0000269|PubMed:20052411}.

 

ADD COMMENTlink written 3.8 years ago by James W. MacDonald52k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 231 users visited in the last hour