converting UniProt to SGD ids using UniProt.wd
4
0
Entering edit mode
Joseph Barry ▴ 160
@joseph-barry-5000
Last seen 8.1 years ago
Dana-Farber Cancer Institute, Boston, U…

I would like to use UniProt.ws to convert UniProt ids to SGD ids for Saccharomyces cerevisiae but my attempts so far have resulted in the error:

Error in .select(x, keys, columns, keytype) :
  No data is available for the keys provided.

Here is a minimal example, where I attempt to convert "I2HB52". The expected answer is "YBR056W-A" (see http://www.uniprot.org/uniprot/I2HB52 ).

library(UniProt.ws)
taxId(UniProt.ws) <- 4932
species(UniProt.ws)
res <- select(x=UniProt.ws, keys="I2HB52", columns="SGD", keytype="UNIPROTKB")

Is this particular conversion currently possible using UniProt.wd? Different choices for 'columns' also result in the same error.

Thanks in advance,

Joseph Barry

Session Info:

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] UniProt.ws_2.6.0 RCurl_1.95-4.5   bitops_1.0-6     RSQLite_1.0.0   
[5] DBI_0.3.1       

loaded via a namespace (and not attached):
[1] AnnotationDbi_1.28.1 Biobase_2.26.0       BiocGenerics_0.12.1
[4] GenomeInfoDb_1.2.4   IRanges_2.0.1        parallel_3.1.2      
[7] S4Vectors_0.4.0      stats4_3.1.2         tools_3.1.2       

 

 

uniprot.ws uniprot ensembl • 2.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States

The link you have above is for a particular strain of yeast, but you are selecting the TaxonId for 'regular' yeast.

> library(UniProt.ws)
Loading required package: RSQLite
Loading required package: DBI
Loading required package: RCurl
Loading required package: bitops
> availableUniprotSpecies(pattern="cerevisiae")
   taxon ID                                                      Species name
1     11008                                Saccharomyces cerevisiae virus L-A
2     42478                               Saccharomyces cerevisiae virus L-BC
3     12450                          Saccharomyces cerevisiae killer virus M1
4    285006                         Saccharomyces cerevisiae (strain RM11-1a)
5    574961                          Saccharomyces cerevisiae (strain JAY291)
6    545124                        Saccharomyces cerevisiae (strain AWRI1631)
7    307796                          Saccharomyces cerevisiae (strain YJM789)
8    643680 Saccharomyces cerevisiae (strain Lalvin EC1118 / Prise de mousse)
9    764097                         Saccharomyces cerevisiae (strain AWRI796)
10   764102                        Saccharomyces cerevisiae (strain FostersB)
11   721032      Saccharomyces cerevisiae (strain Kyokai no. 7 / NBRC 101557)
12   764098                     Saccharomyces cerevisiae (strain Lalvin QA23)
13   764101                        Saccharomyces cerevisiae (strain FostersO)
14   559292             Saccharomyces cerevisiae (strain ATCC 204508 / S288c)
15   764099                          Saccharomyces cerevisiae (strain VIN 13)
16     4932                                          Saccharomyces cerevisiae
17   764100                   Saccharomyces cerevisiae (strain Zymaflore VL3)
> taxId(UniProt.ws) <- 559292
> res <- select(x=UniProt.ws, keys="I2HB52", columns="SGD", keytype="UNIPROTKB")
Getting mapping data for I2HB52 ... and SGD_ID
> res
  UNIPROTKB        SGD
1    I2HB52 S000028736
> taxId(UniProt.ws) <- 4932
> res <- select(x=UniProt.ws, keys="I2HB52", columns="SGD", keytype="UNIPROTKB")
Error in .select(x, keys, columns, keytype) :
  No data is available for the keys provided.

 

ADD COMMENT
0
Entering edit mode
Joseph Barry ▴ 160
@joseph-barry-5000
Last seen 8.1 years ago
Dana-Farber Cancer Institute, Boston, U…

Hi James,

Great, thanks a lot. Works like a charm now.

Best, Joseph

ADD COMMENT
0
Entering edit mode
Joseph Barry ▴ 160
@joseph-barry-5000
Last seen 8.1 years ago
Dana-Farber Cancer Institute, Boston, U…

On a related note, I found somewhat strange NA behaviour. If one includes "ENSEMBL" in the "columns" vector, which returns NA, all other columns also switch to NA when returned. I guess this is not desirable behaviour for most users.

> taxId(UniProt.ws) <- 559292
> species(UniProt.ws)
[1] "Saccharomyces cerevisiae (strain ATCC 204508 / S288c)"
> res <- select(x=UniProt.ws, keys="I2HB52", columns=c("SGD", "SEQUENCE"), keytype="UNIPROTKB")
Getting mapping data for I2HB52 ... and SGD_ID
Getting extra data for I2HB52 NA NA etc
> print(res)
  UNIPROTKB        SGD
1    I2HB52 S000028736
                                                            SEQUENCE
1 MRHQYYQPQPMYYQPQPQPIYIQQGPPPPRNDCCCCCNCGDCCSAIANVLCCLCLIDLCCSCAGGM
> res <- select(x=UniProt.ws, keys="I2HB52", columns=c("SGD", "SEQUENCE", "ENSEMBL"), keytype="UNIPROTKB")
Getting mapping data for I2HB52 ... and ENSEMBL_ID
Getting mapping data for I2HB52 ... and SGD_ID
Getting extra data for I2HB52 NA NA etc
> print(res)
  UNIPROTKB  SGD SEQUENCE ENSEMBL
1    I2HB52 <NA>     <NA>    <NA>


 

ADD COMMENT
0
Entering edit mode

I made a small change to the devel version of UniProt.ws, and it now works:

> res <- select(x=UniProt.ws, keys="I2HB52", columns=c("SGD", "SEQUENCE", "ENSEMBL"), keytype="UNIPROTKB")
Getting mapping data for I2HB52 ... and ENSEMBL_ID
Getting mapping data for I2HB52 ... and SGD_ID
Getting extra data for I2HB52 NA NA etc
> res
  UNIPROTKB        SGD
1    I2HB52 S000028736
                                                            SEQUENCE ENSEMBL
1 MRHQYYQPQPMYYQPQPQPIYIQQGPPPPRNDCCCCCNCGDCCSAIANVLCCLCLIDLCCSCAGGM    <NA>

I'll check with Marc Carlson about adding this change to the package.

ADD REPLY
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.4 years ago
United States

That is indeed a safe looking change.

I have checked in the change proposed.  Thanks for the bug fix! 

You people are the best,

 

 Marc

ADD COMMENT
0
Entering edit mode

Joseph-

Note that Marc has checked the change into the devel repository, and this should propagate to the download server in the next day or so. If you want to use the updated version you will need to use a devel version of R and BioC.

A hypothetical alternative, if you don't want to use the devel version (hypothetical to me that is, as I don't use MacOS for any real work) is to download the source tarball, unzip it, and then with your favorite editor open the file UniProt.ws/R/methods-select.R

Scroll down to the function .getUPMappata():

.getUPMappdata <- function(colMappers, keys){
  ## get a list of mapping results (as data.frames)
  res <- lapply(colMappers, FUN=mapUniprot, from="ACC+ID", query=keys)
  ## Them merge all these mappings together based on UniProt.
  .mergeList(res, joinType="left")
}

And change that last line to read

.mergeList(res, joinType="all")

then save the file. Since this package doesn't have any compiled code, I believe you can then start R, change the working directory to wherever you put the UniProt.ws, and then do

install.packages("UniProt.ws", type = "source", repos = NULL)

and then you should be good to go.

ADD REPLY

Login before adding your answer.

Traffic: 478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6