Secondary accession lookup with UniProt.ws
2
0
Entering edit mode
bengelmann ▴ 10
@bengelmann-7033
Last seen 7.1 years ago
Chicago

Hello

I am having trouble retrieving FASTA sequences for a some uniprot identifiers. It seems that in most cases this is due to the accession number now being a 'secondary accession number'. Is there a way to retrieve sequences using these secondary accession numbers with uniprot.ws?

Thanks

-Brett

uniprot.ws • 2.9k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States

I would like to help you with this.  But could you please give us a specific example (as described in our posting guidelines here:  http://bioconductor.org/help/support/posting-guide/)

Thanks!

 Marc

ADD COMMENT
0
Entering edit mode

Hi Marc

Sorry for the brevity there. Hopefully this is a little better. Here is some example code:

proteins <- c("Q15366", "B4DXP5", "B4DLC0", "F8W0G4")
sequences <- selectUniProt.ws, keys = proteins, columns = "SEQUENCE", keytype = "UNIPROTKB")

The "B4DXP5" accession returns NA for sequence:

sequences$UNIPROTKB[whichis.na(sequences$SEQUENCE))]
[1] "B4DXP5"

It seems that this is the case because of this annotation being 'rolled up' into the new primary accession number, "Q15366".

http://www.uniprot.org/uniprot/Q15366#entry_information

Is there a way for these secondary accession numbers to return the sequence information for the primary accession number? Perhaps with a warning message or a similar flag passed?

Many Thanks

-Brett

> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Homo.sapiens_1.1.2                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 org.Hs.eg.db_3.0.0                     
 [4] GO.db_3.0.0                             OrganismDbi_1.8.1                       GenomicFeatures_1.18.6                 
 [7] GenomicRanges_1.18.4                    AnnotationDbi_1.28.2                    GenomeInfoDb_1.2.4                     
[10] IRanges_2.0.1                           S4Vectors_0.4.0                         Biobase_2.26.0                         
[13] BiocGenerics_0.12.1                     UniProt.ws_2.6.2                        RCurl_1.95-4.5                         
[16] bitops_1.0-6                            RSQLite_1.0.0                           DBI_0.3.1                              
[19] BiocInstaller_1.16.2                   

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.6           BBmisc_1.9              BiocParallel_1.0.3      biomaRt_2.22.0          Biostrings_2.34.1      
 [7] brew_1.0-6              checkmate_1.5.2         codetools_0.2-11        digest_0.6.8            fail_1.2                foreach_1.4.2          
[13] GenomicAlignments_1.2.2 graph_1.44.1            iterators_1.0.7         RBGL_1.42.0             Rsamtools_1.18.3        rtracklayer_1.26.3     
[19] sendmailR_1.2-1         stringr_0.6.2           tools_3.1.3             XML_3.98-1.1            XVector_0.6.0           zlibbioc_1.12.0        

 

ADD REPLY
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States

Hi Brett,

Thanks for you patience with me (we are doing a release and so other things keep jumping the queue on you). 

But as of right now, I can't find any evidence that B4DXP5 is currently a Uniprot.ws accession.  It seems that it probably was at one point in time, and maybe it even was when you gave me that link above, but it doesn't seem to be anywhere on that page now. 

Also, the keys method does not currently return "B4DXP5" as a valid key of type "UNIPROTKB":

k <- keys(UniProt.ws, "UNIPROTKB")
c("Q15366","B4DXP5","B4DLC0","F8W0G4") %in% k

Ultimately, Uniprot.ws talks to the Uniprot web service, so if they have deprecated this ID, then it's possible that it was inactive (but still on their web site for a couple days).  Do you have another example of an ID that is currently valid and that you feel should work?

 Marc

ADD COMMENT
0
Entering edit mode

 

This is bizarre - allow me to jump in.

'B4DXP5' is still a valid accession (see http://www.uniprot.org/uniprot/Q15366.txt) :

ID   PCBP2_HUMAN             Reviewed;         365 AA.
AC   Q15366; A8K7X6; B4DXP5; F8VYL7; G3V0E8; I6L8F9; Q32Q82; Q59HD4;
AC   Q68Y55; Q6IPF4; Q6PKG5;
DT   29-MAY-2000, integrated into UniProtKB/Swiss-Prot.
DT   31-OCT-1996, sequence version 1.
DT   31-MAR-2015, entry version 151.
DE   RecName: Full=Poly(rC)-binding protein 2;

 

Hans-Rudolf

 

ADD REPLY
0
Entering edit mode

Actually it's not. It was replaced by Q15366. The page you show just lists historical accession numbers that got scooped into Q15366.

ADD REPLY
0
Entering edit mode

...and that is the reason why it is called a 'secondary' accession. It is still a valid accessions for UniProtKB. And you can use it to search UniProtKB, eg:

http://www.uniprot.org/uniprot/?query=accession:B4DXP5&format=fasta

it will give you, of course: "Q15366"  (this is how the UniProtKB deals with UniProtKB-TrEMBL entries who have been merged into a UniProtKB-Swiss-Prot entry).

I am sorry, my intention was not to start a debate and I don't know much about the inner works of 'UniProt.ws' at all. I just wanted to support Bret's original question about "Secondary accession lookup with UniProt.ws"

Hans-Rudolf

 

 

 

ADD REPLY
0
Entering edit mode

I am not sure anybody calls it a secondary accession, least of all UniProt:

http://www.uniprot.org/uniprot/?query=accession%3AB4DXP5&sort=score

The term obsolete seems pretty unambiguous, doesn't it?

ADD REPLY
0
Entering edit mode

I am sorry, but I can't resist:

"while the others are referred to as ‘Secondary accession numbers"

see: http://www.uniprot.org/help/accession_numbers

ADD REPLY
0
Entering edit mode

Fair enough. I stand corrected.

That said, I don't believe Uniprot.ws is going to be able to come up with any secondary accession numbers because it scrapes this page:

http://www.uniprot.org/uniprot/?query=organism:9606&format=tab&columns=id

for IDs, and those are only the primary IDs.

ADD REPLY
0
Entering edit mode

Yes.  This is unfortunate, but unless UniProt sees fit to actually allow me to use the IDs to look up actual information, then it doesn't really matter what they are called.  :(

Also problematic is the fact that the UniProt site is a very large resource and is not extremely fast as a result.  I suspect that adding in all the older IDs might represent as very significant slowdown for the service (depending on how many there are).

ADD REPLY

Login before adding your answer.

Traffic: 702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6