NCBI protein to NCBI cDNA (Ensembl conversions would be great as well) in R
1
0
Entering edit mode
R_Page • 0
@f83463b2
Last seen 4 months ago
United States

Hello all, I was wondering if it is possible to convert an NCBI protein, say, CPB2 (acc. no. NP_001265470.1) to its respective cDNA value? I must be frank, I am not entirely familiar with what cDNA is (or how one would find it on CPB2's NCBI's NLM site), so any clarifications would be appreciated. I am also wondering how the process would look like for Ensembl genes, as well as if one could put an NCBI protein or accession number into an R program and get the respective Ensembl cDNA value. Thank you for any help, Rob

Rstudio cDNA ensembldb NCBI • 487 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States

I don't know what you mean by 'cDNA value'. That could mean any number of things, I suppose, particularly if you don't know what cDNA is (probably not really a thing in this context - cDNA stands for 'complementary DNA', and is what you get when you use reverse transcriptase to generate DNA from RNA).

In addition, CPB2 might be a protein symbol, or it might be an HGNC gene symbol. In this case it's both.

> library(org.Hs.eg.db)
## map CPB2 to NCBI ID
> select(org.Hs.eg.db, "CPB2", c("ENTREZID","SYMBOL"), "ALIAS")
'select()' returned 1:1 mapping
between keys and columns
  ALIAS ENTREZID SYMBOL
1  CPB2     1361   CPB2

## add Ensembl
> select(org.Hs.eg.db, "CPB2", c("ENTREZID","SYMBOL","ENSEMBL"), "ALIAS")
'select()' returned 1:1 mapping between keys and columns
  ALIAS ENTREZID SYMBOL         ENSEMBL
1  CPB2     1361   CPB2 ENSG00000080618

Here I assume it's an HGNC gene symbol. But I don't believe that HGNC gene symbols and protein symbols (like from UniProt) are necessarily the same, so you could also use UniProt to map.

> library(UniProt.ws)
> ws <- UniProt.ws()
> z <- select(ws, "CPB2", c("gene_names","xref_geneid","xref_ensembl","organism_name"), "Gene_Name")

## check that we can get a tractable number of rows
> dim(subset(z, Organism == "Homo sapiens (Human)"))
[1] 4 6

## looks OK
> subset(z, Organism == "Homo sapiens (Human)")
    From      Entry Gene.Names GeneID                        Ensembl             Organism
2   CPB2     Q96IY4       CPB2  1361; ENST00000181383.10 [Q96IY4-1]; Homo sapiens (Human)
6   CPB2 A0A087WSY5       CPB2  1361;             ENST00000439329.5; Homo sapiens (Human)
161 CPB2 A0A6Q8PG06       CPB2   <NA>             ENST00000675730.1; Homo sapiens (Human)
162 CPB2 A0A6Q8PHS9       CPB2   <NA>             ENST00000674625.1; Homo sapiens (Human)
1
Entering edit mode

It does look like CBP2 is an HGNC symbol, not a protein ID though.

> z <- select(ws, "CPB2", c("gene_names","xref_geneid","xref_ensembl","organism_name", "id"), "Gene_Name")
> subset(z, Organism == "Homo sapiens (Human)")
    From      Entry Gene.Names GeneID                        Ensembl             Organism       Entry.Name
2   CPB2     Q96IY4       CPB2  1361; ENST00000181383.10 [Q96IY4-1]; Homo sapiens (Human)      CBPB2_HUMAN
6   CPB2 A0A087WSY5       CPB2  1361;             ENST00000439329.5; Homo sapiens (Human) A0A087WSY5_HUMAN
161 CPB2 A0A6Q8PG06       CPB2   <NA>             ENST00000675730.1; Homo sapiens (Human) A0A6Q8PG06_HUMAN
162 CPB2 A0A6Q8PHS9       CPB2   <NA>             ENST00000674625.1; Homo sapiens (Human) A0A6Q8PHS9_HUMAN

Apparently the protein ID is CBPB2.

ADD REPLY
0
Entering edit mode

Thank you, this works well. As for CDNA, I figured out what I need to find, and have made a new post regarding it.

ADD REPLY

Login before adding your answer.

Traffic: 605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6