How can I convert the "Majority protein IDs" to "Gene names" (example below shown)?
1
0
Entering edit mode
tpm • 0
@tpm-23631
Last seen 11 weeks ago
Netherlands

Hello.

I was kindly asking how I can convert "Majority protein IDs" to "Gene names" like so:

Majority.protein.IDs

A0AV96-2;B7Z8Z7;A0AV96;D6R9D6;D6RBS9
A0AVT1;A0AVT1-2
A1L0T0;M0R026
A1XBS5-5;E5RHK0;F8W7P5;E5RGD0;H0YC32;A1XBS5-2;A1XBS5-4;A1XBS5-3;A1XBS5
Q99798;A2A274
A2A2M0;Q9NQG5
Q9Y312;A2A2Q9
A2A2V2;P42696;Q5TCT4;P42696-2
A2A2Z9;Q9H560
A6PW58;A2A5X0;Q99755-2;Q99755-4;Q99755-3;A6PW57;Q99755
O15533-2;O15533;D3YTI9;A2AB90;O15533-3;C9JA35;O15533-4
A2IDC6;Q4TT38;Q13084
Q13887-2;A2TJX0;Q13887
A3KFJ0;O14965;Q5QPD4;A3KFJ1;Q5QPD2
A3KMH1-3;A3KMH1;A3KMH1-2


Gene names

RBM47
UBA6
ILVBL
FAM92A1
ACO2
RPRD1B
AAR2
RBM34
ANKRD18B;ANKRD19P
PIP5K1A
TAPBP
MRPL28
KLF5
AURKA
VWA8


My actual specific question is on how I can convert the list below to its appropriate Gene name:

tr|A0A4V3YUP9|A0A4V3YUP9_ECOLI;sp|P67660|YHAJ_ECOLI
tr|A0A4S5AVI8|A0A4S5AVI8_ECOLI
tr|A0A4S5AXW9|A0A4S5AXW9_ECOLI;sp|P06715|GSHR_ECOLI
tr|A0A4S5AR26|A0A4S5AR26_ECOLI;sp|P25746|HFLD_ECOLI
tr|A0A4S5B017|A0A4S5B017_ECOLI
tr|A0A4S5B5Y8|A0A4S5B5Y8_ECOLI;sp|P0AEN8|FUCM_ECOLI
tr|A0A6D2XCX9|A0A6D2XCX9_ECOLI;sp|P0A8C4|YGFB_ECOLI
tr|A0A6D2XI58|A0A6D2XI58_ECOLI;sp|P0ABI8|CYOB_ECOLI
tr|A0A6D2X748|A0A6D2X748_ECOLI;sp|P0A972|CSPE_ECOLI
tr|A0A6D2W544|A0A6D2W544_ECOLI;sp|P30130|FIMD_ECOLI
tr|A0A6D2W7D6|A0A6D2W7D6_ECOLI;sp|P0AFP4|YBBO_ECOLI
tr|A0A4S4P6R5|A0A4S4P6R5_ECOLI
sp|P0AGE6|CHRR_ECOLI;tr|A0A6D2WS16|A0A6D2WS16_ECOLI
tr|A0A6D2WPV4|A0A6D2WPV4_ECOLI;sp|P0A8J4|YBED_ECOLI
tr|A0A4S5APJ5|A0A4S5APJ5_ECOLI;sp|P27306|STHA_ECOLI

MassSpectrometryData Proteome Database DEP • 324 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States

The easiest thing to do would be to get the UniProtKB and/or SwissProt IDs and use the mapping tool.

There is a REST API that you can use (which is what the UniProt.ws package uses), but for a simple thing like this it's probably easier to just use the mapping tool.

0
Entering edit mode

Thank you James. I am not sure which is the ideal option to use on under "Select Option" from the link you proposed. Maybe you can suggest me the appropriate options to use on the site, if possible?

0
Entering edit mode

You have UniProtKB IDs and so far as I can tell you want Gene Names, so that's what I would use.

0
Entering edit mode

If for example I plaste this header on the site, I get an error. In particular, suppose I have 1000 rows, how can I make this switch to gene names, using the site you proposed.

tr|A0A4V3YUP9|A0A4V3YUP9_ECOLI;sp|P67660|YHAJ_ECOLI
tr|A0A4S5AVI8|A0A4S5AVI8_ECOLI
tr|A0A4S5AXW9|A0A4S5AXW9_ECOLI;sp|P06715|GSHR_ECOLI
tr|A0A4S5AR26|A0A4S5AR26_ECOLI;sp|P25746|HFLD_ECOLI
tr|A0A4S5B017|A0A4S5B017_ECOLI
tr|A0A4S5B5Y8|A0A4S5B5Y8_ECOLI;sp|P0AEN8|FUCM_ECOLI
tr|A0A6D2XCX9|A0A6D2XCX9_ECOLI;sp|P0A8C4|YGFB_ECOLI
tr|A0A6D2XI58|A0A6D2XI58_ECOLI;sp|P0ABI8|CYOB_ECOLI
tr|A0A6D2X748|A0A6D2X748_ECOLI;sp|P0A972|CSPE_ECOLI
tr|A0A6D2W544|A0A6D2W544_ECOLI;sp|P30130|FIMD_ECOLI
tr|A0A6D2W7D6|A0A6D2W7D6_ECOLI;sp|P0AFP4|YBBO_ECOLI
tr|A0A4S4P6R5|A0A4S4P6R5_ECOLI
sp|P0AGE6|CHRR_ECOLI;tr|A0A6D2WS16|A0A6D2WS16_ECOLI
tr|A0A6D2WPV4|A0A6D2WPV4_ECOLI;sp|P0A8J4|YBED_ECOLI
tr|A0A4S5APJ5|A0A4S5APJ5_ECOLI;sp|P27306|STHA_ECOLI

0
Entering edit mode

If for example I plaste this header on the site, I get an error. In particular, suppose I have 1000 rows, how can I make this switch to gene names, using the site you proposed.

tr|A0A4V3YUP9|A0A4V3YUP9_ECOLI;sp|P67660|YHAJ_ECOLI
tr|A0A4S5AVI8|A0A4S5AVI8_ECOLI
tr|A0A4S5AXW9|A0A4S5AXW9_ECOLI;sp|P06715|GSHR_ECOLI
tr|A0A4S5AR26|A0A4S5AR26_ECOLI;sp|P25746|HFLD_ECOLI
tr|A0A4S5B017|A0A4S5B017_ECOLI
tr|A0A4S5B5Y8|A0A4S5B5Y8_ECOLI;sp|P0AEN8|FUCM_ECOLI
tr|A0A6D2XCX9|A0A6D2XCX9_ECOLI;sp|P0A8C4|YGFB_ECOLI
tr|A0A6D2XI58|A0A6D2XI58_ECOLI;sp|P0ABI8|CYOB_ECOLI
tr|A0A6D2X748|A0A6D2X748_ECOLI;sp|P0A972|CSPE_ECOLI
tr|A0A6D2W544|A0A6D2W544_ECOLI;sp|P30130|FIMD_ECOLI
tr|A0A6D2W7D6|A0A6D2W7D6_ECOLI;sp|P0AFP4|YBBO_ECOLI
tr|A0A4S4P6R5|A0A4S4P6R5_ECOLI
sp|P0AGE6|CHRR_ECOLI;tr|A0A6D2WS16|A0A6D2WS16_ECOLI
tr|A0A6D2WPV4|A0A6D2WPV4_ECOLI;sp|P0A8J4|YBED_ECOLI
tr|A0A4S5APJ5|A0A4S5APJ5_ECOLI;sp|P27306|STHA_ECOLI

1
Entering edit mode

This thing:

tr|A0A4V3YUP9|A0A4V3YUP9_ECOLI;sp|P67660|YHAJ_ECOLI


Isn't an ID! It's a set of identifiers separated by vertical bars and a semi-colon, with the intent that the reader will understand that. The first thing there is

tr|A0A4V3YUP9|A0A4V3YUP9_ECOLI


Which indicates that it's a TrEMBL ID (tr), the ID being A0A4V3YUP9, and the name(?) being A0A4V3YUP9_ECOLI, which is just the ID concatenated with the species. So if you were to go to uniprot.org and search on that ID you would get this.

The semi-colon separates the first annotation from the second, which is

sp|P67660|YHAJ_ECOLI


Which indicates that it's a SwissProt (sp) ID, the ID being P67660, and the name being YHAJ_ECOLI. Which you can see here.

So what I said you could do was to get an ID for each row (either the TrEMBL or SwissProt) and then paste it into the query box on the UniProt site I pointed you to. I made the assumption that you would understand that what I really meant was that you would have to extract the relevant ID from each row and use that, rather than just copy/pasting the whole thing, which as you have noted doesn't work.

How you would get the relevant ID is up to you. I would probably use some combination of strsplit and sapply, but I'm old school like that. I'll leave it up to you to figure out how to do that, which is the best way to learn how to do anyway.

0
Entering edit mode

I see. Thank you very much for the feedback, I noticed on R I could str_split, as you suggested is very useful. Much appreciated.