Retrieve isoform sequences for a protein from UniProt
1
0
Entering edit mode
sgupt46 • 0
@sgupt46-13716
Last seen 9 months ago
Canada

Hi, I am trying to get amino acid sequence for a protein-isoform from UniProt but not able to find a solution. I can get canonical sequence but the method to obtain isoform sequence is not available.

library(UniProt.ws)
up <- UniProt.ws(taxId=9606)
select(up, keys = "Q2V2M9", columns = c("SEQUENCE"), keytype = "UNIPROTKB")
# select(up, keys = "Q2V2M9-4", columns = c("SEQUENCE"), keytype = "UNIPROTKB")
# Isoform id doesn't work. Because it is also not a key in the UniProt object.

Does anyone know if there is a way to get isoform sequence?

UniProt.ws biomaRt • 1.5k views
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 13 hours ago
EMBL Heidelberg

I'm not sure how to approach this with UniProt.ws, but does this get you what you're looking for via biomaRt ?

library(biomaRt)

ensembl <- useEnsembl(biomart = 'genes', dataset = 'hsapiens_gene_ensembl', mirror = "www")
getBM(attributes = c("peptide", "ensembl_gene_id", "ensembl_transcript_id", "uniprotswissprot"),
      filters = c("uniprot_gn_id"), 
      values = "Q2V2M9", 
      mart = ensembl) |>
  tibble::as_tibble()
#> # A tibble: 7 × 4
#>   peptide                     ensembl_gene_id ensembl_transcri… uniprotswissprot
#>   <chr>                       <chr>           <chr>             <chr>           
#> 1 MAMRPRSHPPVGAGTGGGPACVPVAE… ENSG00000134775 ENST00000591635   ""              
#> 2 RLFWNEVRPFDWPCKNNRRCREFLWS… ENSG00000134775 ENST00000592128   ""              
#> 3 MATLACRVQFLDDTDPFNSTNFPEPS… ENSG00000134775 ENST00000590592   "Q2V2M9"        
#> 4 XAINIGLTVLPPPRTIKIAILNFDEY… ENSG00000134775 ENST00000585579   ""              
#> 5 MATLACRVQFLDDTDPFNSTNFPEPS… ENSG00000134775 ENST00000257209   "Q2V2M9"        
#> 6 MATLACRVQFLDDTDPFNSTNFPEPS… ENSG00000134775 ENST00000359247   "Q2V2M9"        
#> 7 XFRLVVKTALKLLLVFVEYSESNAPL… ENSG00000134775 ENST00000592930   ""

That should be the peptide sequence for each transcript listed in Ensembl. I guess the 3 results that have an entry in the uniprotswissprot column are those that can found on https://www.uniprot.org/uniprot/Q2V2M9; It's not clear to me where the fourth isoform listed on that page has gone.

ADD COMMENT

Login before adding your answer.

Traffic: 466 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6