Entering edit mode
I have few RefSeq protein IDs eg. NP_853513.2, NP_000517.2
.
Is there a to find corresponding UniProt IDs in Bioconductor?
I have few RefSeq protein IDs eg. NP_853513.2, NP_000517.2
.
Is there a to find corresponding UniProt IDs in Bioconductor?
Sometimes you can do this using an OrgDb
package.
> library(org.Hs.eg.db)
> select(org.Hs.eg.db, c("NP_853513.2", "NP_000517.2"), "UNIPROT", "REFSEQ")
Error in .testForValidKeys(x, keys, keytype, fks) :
None of the keys entered are valid keys for 'REFSEQ'. Please use the keys method to see a listing of valid arguments.
## Ugh. Let's strip off the tailing version numbers
> select(org.Hs.eg.db, gsub("\\[1-9]$", "", c("NP_853513.2", "NP_000517.2")), "UNIPROT", "REFSEQ")
Error in .testForValidKeys(x, keys, keytype, fks) :
None of the keys entered are valid keys for 'REFSEQ'. Please use the keys method to see a listing of valid arguments.
## still no joy but is that because I'm a dummy? You actually have to strip the period AND the number
> select(org.Hs.eg.db, gsub("\\.[1-9]$", "", c("NP_853513.2", "NP_000517.2")), "UNIPROT", "REFSEQ")
'select()' returned 1:1 mapping between keys and columns
REFSEQ UNIPROT
1 NP_853513 Q7Z3Y7
2 NP_000517 P02533
So let's try UniProt.ws
.
> library(UniProt.ws)
Loading required package: RSQLite
Loading required package: RCurl
Warning messages:
1: package 'UniProt.ws' was built under R version 4.0.3
2: package 'RSQLite' was built under R version 4.0.3
3: package 'RCurl' was built under R version 4.0.3
> up <- UniProt.ws()
> select(up, c("NP_853513.2", "NP_000517.2"), "UNIPROTKB", "REFSEQ_PROTEIN")
Getting mapping data for NP_853513.2 ... and ACC
error while trying to retrieve data in chunk 1:
no results after 5 attempts; please try again later
continuing to try
Error in `colnames<-`(`*tmp*`, value = `*vtmp*`) :
attempt to set 'colnames' on an object with less than two dimensions
## Huh. That's a drag
Let's try biomaRt
> library(biomaRt)
> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl")
> getBM(c("uniprot_gn_id","uniprotswissprot","refseq_peptide"), "refseq_peptide", c("NP_853513.2", "NP_000517.2"), mart)
[1] uniprot_gn_id uniprotswissprot refseq_peptide
<0 rows> (or 0-length row.names)
## Huh. Super annoying. Maybe it's the version numbers? Let's strip those off
> getBM(c("uniprot_gn_id","uniprotswissprot","refseq_peptide"), "refseq_peptide", gsub("\\.[1-9]$", "", c("NP_853513.2", "NP_000517.2")), mart)
uniprot_gn_id uniprotswissprot refseq_peptide
1 P02533 P02533 NP_000517
2 Q7Z3Y7 Q7Z3Y7 NP_853513
## BOOM! nailed it on the third try...
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
See edits. Turns out for these two proteins the
OrgDb
andbiomaRt
work. Unfortunately it appearsUniProt.ws
is having problems...