I would like to annotate a set of UniProt IDs within BioC/R. To this end I am using the UniProt.ws
package. However, when using R/BioC I am not able to retrieve the full information that apparently is available in the UniProt database.
To be specific; among others I would like to retrieve the preferred gene name, which is labeled "Gene names (primary)" in the resulting table when using UniProt's web-based ID mapping interface.
As example, when 'manually' retrieving annotation info for the (rat) UniProt ID "Q6MGA6" (through the Uniprot website) I got this results:
As can be seen, this ID links to multiple (13) gene names/synonyms (7th column), but the primary gene name is Psmb9 (5th column).
Question: which columns
to select for retrieving the primary gene name when using the UniProt.ws package?
--> From the URL I deduced that the name of the 'column' that should be selected for this query is labelled "genes(PREFERRED)" (corresponds to "genes%28PREFERRED%29") in the URL. However, this column is not present/accessible when using the UniProt.ws library.
Any hints would be greatly appreciated.
Thanks,
Guido
> library(UniProt.ws) > #set taxonomy ID for Rn > taxId(UniProt.ws) <- 10116 > > # check > species(UniProt.ws) [1] "Rattus norvegicus" > # check which columns (annotation info) can be retrieved > # In total there are 125 annotation columns available, but "genes(PREFERRED)" isn't one of these... > > head(columns(UniProt.ws)) [1] "UNIPROTKB" "UNIPARC" "UNIREF50" [4] "UNIREF90" "UNIREF100" "EMBL/GENBANK/DDBJ" > > IDkeys <- c("Q6MGA6", "A0A023IMI6") > annotation <- select(x=UniProt.ws, keys=IDkeys, columns=c("RGD"), keytype="UNIPROTKB") # This works! Getting mapping data for Q6MGA6 ... and RGD_ID > annotation UNIPROTKB RGD 1 Q6MGA6 3427 2 A0A023IMI6 <NA> > > # but this not, although "GENEID" is listed as column type... > annotation <- select(x=UniProt.ws, keys=IDkeys, columns=c("GENEID"), keytype="UNIPROTKB") Getting mapping data for Q6MGA6 ... and P_ENTREZGENEID Error in `[.data.frame`(tab, , oriTabCols) : undefined columns selected > > columns(UniProt.ws)[37] [1] "GENEID" > > ## this works > annotation <- select(x=UniProt.ws, keys=IDkeys, columns=c("ENTREZ_GENE"), keytype="UNIPROTKB") Getting mapping data for Q6MGA6 ... and P_ENTREZGENEID > annotation UNIPROTKB ENTREZ_GENE 1 Q6MGA6 24967 2 A0A023IMI6 24968 > > # but this not.... > annotation <- select(x=UniProt.ws, keys=IDkeys, columns=c("genes(PREFERRED)"), keytype="UNIPROTKB") Error in .select(x, keys, columns, keytype) : columns argument MUST match a value returned by columns method > > sessionInfo() R version 3.1.2 Patched (2015-02-03 r67717) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] UniProt.ws_2.6.0 RCurl_1.95-4.5 bitops_1.0-6 RSQLite_1.0.0 [5] DBI_0.3.1 loaded via a namespace (and not attached): [1] AnnotationDbi_1.28.1 Biobase_2.26.0 BiocGenerics_0.12.1 [4] GenomeInfoDb_1.2.4 IRanges_2.0.1 parallel_3.1.2 [7] S4Vectors_0.4.0 stats4_3.1.2 >
Thanks, I didn't think of using the org.Rn.eg.db package, but this indeed works fine for my current case.
As a side node, I had the impression that
keys
andcolumns
would only show rat-specific annotation info, and not all available info, including those specific for certain species.