I would like to annotate a set of UniProt IDs within BioC/R. To this end I am using the UniProt.ws package. However, when using R/BioC I am not able to retrieve the full information that apparently is available in the UniProt database.
To be specific; among others I would like to retrieve the preferred gene name, which is labeled "Gene names (primary)" in the resulting table when using UniProt's web-based ID mapping interface.
As example, when 'manually' retrieving annotation info for the (rat) UniProt ID "Q6MGA6" (through the Uniprot website) I got this results:
As can be seen, this ID links to multiple (13) gene names/synonyms (7th column), but the primary gene name is Psmb9 (5th column).
Question: which columns to select for retrieving the primary gene name when using the UniProt.ws package?
--> From the URL I deduced that the name of the 'column' that should be selected for this query is labelled "genes(PREFERRED)" (corresponds to "genes%28PREFERRED%29") in the URL. However, this column is not present/accessible when using the UniProt.ws library.
Any hints would be greatly appreciated.
Thanks,
Guido
> library(UniProt.ws)
> #set taxonomy ID for Rn
> taxId(UniProt.ws) <- 10116
>
> # check
> species(UniProt.ws)
[1] "Rattus norvegicus"
> # check which columns (annotation info) can be retrieved
> # In total there are 125 annotation columns available, but "genes(PREFERRED)" isn't one of these...
>
> head(columns(UniProt.ws))
[1] "UNIPROTKB" "UNIPARC" "UNIREF50"
[4] "UNIREF90" "UNIREF100" "EMBL/GENBANK/DDBJ"
>
> IDkeys <- c("Q6MGA6", "A0A023IMI6")
> annotation <- select(x=UniProt.ws, keys=IDkeys, columns=c("RGD"), keytype="UNIPROTKB") # This works!
Getting mapping data for Q6MGA6 ... and RGD_ID
> annotation
UNIPROTKB RGD
1 Q6MGA6 3427
2 A0A023IMI6 <NA>
>
> # but this not, although "GENEID" is listed as column type...
> annotation <- select(x=UniProt.ws, keys=IDkeys, columns=c("GENEID"), keytype="UNIPROTKB")
Getting mapping data for Q6MGA6 ... and P_ENTREZGENEID
Error in `[.data.frame`(tab, , oriTabCols) : undefined columns selected
>
> columns(UniProt.ws)[37]
[1] "GENEID"
>
> ## this works
> annotation <- select(x=UniProt.ws, keys=IDkeys, columns=c("ENTREZ_GENE"), keytype="UNIPROTKB")
Getting mapping data for Q6MGA6 ... and P_ENTREZGENEID
> annotation
UNIPROTKB ENTREZ_GENE
1 Q6MGA6 24967
2 A0A023IMI6 24968
>
> # but this not....
> annotation <- select(x=UniProt.ws, keys=IDkeys, columns=c("genes(PREFERRED)"), keytype="UNIPROTKB")
Error in .select(x, keys, columns, keytype) :
columns argument MUST match a value returned by columns method
>
> sessionInfo()
R version 3.1.2 Patched (2015-02-03 r67717)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] UniProt.ws_2.6.0 RCurl_1.95-4.5 bitops_1.0-6 RSQLite_1.0.0
[5] DBI_0.3.1
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.28.1 Biobase_2.26.0 BiocGenerics_0.12.1
[4] GenomeInfoDb_1.2.4 IRanges_2.0.1 parallel_3.1.2
[7] S4Vectors_0.4.0 stats4_3.1.2
>

Thanks, I didn't think of using the org.Rn.eg.db package, but this indeed works fine for my current case.
As a side node, I had the impression that
keysandcolumnswould only show rat-specific annotation info, and not all available info, including those specific for certain species.