Hello
I am having trouble retrieving FASTA sequences for a some uniprot identifiers. It seems that in most cases this is due to the accession number now being a 'secondary accession number'. Is there a way to retrieve sequences using these secondary accession numbers with uniprot.ws?
Thanks
-Brett
Hi Marc
Sorry for the brevity there. Hopefully this is a little better. Here is some example code:
proteins <- c("Q15366", "B4DXP5", "B4DLC0", "F8W0G4")
sequences <- selectUniProt.ws, keys = proteins, columns = "SEQUENCE", keytype = "UNIPROTKB")
The "B4DXP5" accession returns NA for sequence:
sequences$UNIPROTKB[whichis.na(sequences$SEQUENCE))] [1] "B4DXP5"
It seems that this is the case because of this annotation being 'rolled up' into the new primary accession number, "Q15366".
http://www.uniprot.org/uniprot/Q15366#entry_information
Is there a way for these secondary accession numbers to return the sequence information for the primary accession number? Perhaps with a warning message or a similar flag passed?
Many Thanks
-Brett
> sessionInfo() R version 3.1.3 (2015-03-09) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] Homo.sapiens_1.1.2 TxDb.Hsapiens.UCSC.hg19.knownGene_3.0.0 org.Hs.eg.db_3.0.0 [4] GO.db_3.0.0 OrganismDbi_1.8.1 GenomicFeatures_1.18.6 [7] GenomicRanges_1.18.4 AnnotationDbi_1.28.2 GenomeInfoDb_1.2.4 [10] IRanges_2.0.1 S4Vectors_0.4.0 Biobase_2.26.0 [13] BiocGenerics_0.12.1 UniProt.ws_2.6.2 RCurl_1.95-4.5 [16] bitops_1.0-6 RSQLite_1.0.0 DBI_0.3.1 [19] BiocInstaller_1.16.2 loaded via a namespace (and not attached): [1] base64enc_0.1-2 BatchJobs_1.6 BBmisc_1.9 BiocParallel_1.0.3 biomaRt_2.22.0 Biostrings_2.34.1 [7] brew_1.0-6 checkmate_1.5.2 codetools_0.2-11 digest_0.6.8 fail_1.2 foreach_1.4.2 [13] GenomicAlignments_1.2.2 graph_1.44.1 iterators_1.0.7 RBGL_1.42.0 Rsamtools_1.18.3 rtracklayer_1.26.3 [19] sendmailR_1.2-1 stringr_0.6.2 tools_3.1.3 XML_3.98-1.1 XVector_0.6.0 zlibbioc_1.12.0