help with protein IPI annotation mappings
1
0
Entering edit mode
@kimpel-mark-w-727
Last seen 9.6 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070104/ 4ab82f7a/attachment.pl
• 1.7k views
ADD COMMENT
0
Entering edit mode
@steffen-durinck-1780
Last seen 9.6 years ago
Hi Mark, I don't have an IPI id to test this but biomaRt should be able to do what you want. Here's how: library(biomaRt) ipiID = c("your","list","of","IPI","IDs") ensembl=useMart("ensembl", dataset="rnorvegicus_gene_ensembl") getBM(attributes=c("ipi","entezgene"), filters="ipi", values=ipiID, mart=ensembl) you could probably make the result of the query a little cleaner by: getBM(attributes=c("ipi","entezgene"), filters=c("ipi","with_ipi"), values=list(ids=ipiID,""), mart=ensembl) best, Steffen Kimpel, Mark William wrote: > I need to map a list of rat International Protein Index accession ids to > EntrezGene. The proteins have been identified using mass spectroscopy > and thus do not necessarily correspond to any particular affy chipset. > How would I do this in BioC? Can biomaRt handle this? > > > > Thanks, > > Mark > > > > Mark W. Kimpel MD > > > > > > Official Business Address: > > > > Department of Psychiatry > > Indiana University School of Medicine > > PR M116 > > Institute of Psychiatric Research > > 791 Union Drive > > Indianapolis, IN 46202 > > > > Preferred Mailing Address: > > > > 15032 Hunter Court > > Westfield, IN 46074 > > > > (317) 490-5129 Work, & Mobile > > > > (317) 663-0513 Home (no voice mail please) > > 1-(317)-536-2730 FAX > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877
ADD COMMENT
0
Entering edit mode
Steffen, Your code to convert IPI to entrezgene ID's worked like charm. Now I have run into another problem. I have discovered that some of the ID's I need to map are GenBank ID's of the form (GI:XXXX). I have used listAttributes(ensembl) and cannot figure out which, if any correspond to the NCBI GI. A previous post in this list indicated that this should be possible, but I must be missing something. Thanks, Mark Mark W. Kimpel MD (317) 490-5129 Work, & Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -----Original Message----- From: Steffen Durinck [mailto:durincks@mail.nih.gov] Sent: Friday, January 05, 2007 8:57 AM To: Kimpel, Mark William Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] help with protein IPI annotation mappings Hi Mark, I don't have an IPI id to test this but biomaRt should be able to do what you want. Here's how: library(biomaRt) ipiID = c("your","list","of","IPI","IDs") ensembl=useMart("ensembl", dataset="rnorvegicus_gene_ensembl") getBM(attributes=c("ipi","entezgene"), filters="ipi", values=ipiID, mart=ensembl) you could probably make the result of the query a little cleaner by: getBM(attributes=c("ipi","entezgene"), filters=c("ipi","with_ipi"), values=list(ids=ipiID,""), mart=ensembl) best, Steffen Kimpel, Mark William wrote: > I need to map a list of rat International Protein Index accession ids to > EntrezGene. The proteins have been identified using mass spectroscopy > and thus do not necessarily correspond to any particular affy chipset. > How would I do this in BioC? Can biomaRt handle this? > > > > Thanks, > > Mark > > > > Mark W. Kimpel MD > > > > > > Official Business Address: > > > > Department of Psychiatry > > Indiana University School of Medicine > > PR M116 > > Institute of Psychiatric Research > > 791 Union Drive > > Indianapolis, IN 46202 > > > > Preferred Mailing Address: > > > > 15032 Hunter Court > > Westfield, IN 46074 > > > > (317) 490-5129 Work, & Mobile > > > > (317) 663-0513 Home (no voice mail please) > > 1-(317)-536-2730 FAX > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877
ADD REPLY
0
Entering edit mode
Hi Mark, I quickly scanned the attributes and filters and it looks like you currently can not use genbank accession numbers with Ensembl. To be sure you could ask the Ensembl helpdesk: helpdesk at ensembl.org if genbank accession numbers are in their database and what the name of the corresponding filter is. If they don't have genbank ids you could ask them if there is a possibility to include genbank ids in future releases. Whatever information Ensembl makes available is retrievable through the biomaRt package and questions or suggestions related to the data present in Ensembl can be best addressed to their helpdesk. Make sure you let them know you are using the BioMart version of Ensembl. Cheers, Steffen Kimpel, Mark William wrote: > Steffen, > > Your code to convert IPI to entrezgene ID's worked like charm. Now I > have run into another problem. I have discovered that some of the ID's I > need to map are GenBank ID's of the form (GI:XXXX). I have used > listAttributes(ensembl) and cannot figure out which, if any correspond > to the NCBI GI. A previous post in this list indicated that this should > be possible, but I must be missing something. > > Thanks, > Mark > > Mark W. Kimpel MD > > > > (317) 490-5129 Work, & Mobile > > > > (317) 663-0513 Home (no voice mail please) > > 1-(317)-536-2730 FAX > > -----Original Message----- > From: Steffen Durinck [mailto:durincks at mail.nih.gov] > Sent: Friday, January 05, 2007 8:57 AM > To: Kimpel, Mark William > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] help with protein IPI annotation mappings > > Hi Mark, > > I don't have an IPI id to test this but biomaRt should be able to do > what you want. > Here's how: > > library(biomaRt) > ipiID = c("your","list","of","IPI","IDs") > ensembl=useMart("ensembl", dataset="rnorvegicus_gene_ensembl") > getBM(attributes=c("ipi","entezgene"), filters="ipi", values=ipiID, > mart=ensembl) > > you could probably make the result of the query a little cleaner by: > > getBM(attributes=c("ipi","entezgene"), filters=c("ipi","with_ipi"), > values=list(ids=ipiID,""), mart=ensembl) > > best, > Steffen > > > > Kimpel, Mark William wrote: > >> I need to map a list of rat International Protein Index accession ids >> > to > >> EntrezGene. The proteins have been identified using mass spectroscopy >> and thus do not necessarily correspond to any particular affy chipset. >> How would I do this in BioC? Can biomaRt handle this? >> >> >> >> Thanks, >> >> Mark >> >> >> >> Mark W. Kimpel MD >> >> >> >> >> >> Official Business Address: >> >> >> >> Department of Psychiatry >> >> Indiana University School of Medicine >> >> PR M116 >> >> Institute of Psychiatric Research >> >> 791 Union Drive >> >> Indianapolis, IN 46202 >> >> >> >> Preferred Mailing Address: >> >> >> >> 15032 Hunter Court >> >> Westfield, IN 46074 >> >> >> >> (317) 490-5129 Work, & Mobile >> >> >> >> (317) 663-0513 Home (no voice mail please) >> >> 1-(317)-536-2730 FAX >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >> > > > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877
ADD REPLY
0
Entering edit mode
On Monday 08 January 2007 10:22, Steffen Durinck wrote: > Hi Mark, > > I quickly scanned the attributes and filters and it looks like you > currently can not use genbank accession numbers with Ensembl. > To be sure you could ask the Ensembl helpdesk: helpdesk at ensembl.org if > genbank accession numbers are in their database and what the name of the > corresponding filter is. If they don't have genbank ids you could ask > them if there is a possibility to include genbank ids in future releases. > Whatever information Ensembl makes available is retrievable through the > biomaRt package and questions or suggestions related to the data > present in Ensembl can be best addressed to their helpdesk. Make sure > you let them know you are using the BioMart version of Ensembl. > > Cheers, > Steffen > > Kimpel, Mark William wrote: > > Steffen, > > > > Your code to convert IPI to entrezgene ID's worked like charm. Now I > > have run into another problem. I have discovered that some of the ID's I > > need to map are GenBank ID's of the form (GI:XXXX). I have used > > listAttributes(ensembl) and cannot figure out which, if any correspond > > to the NCBI GI. A previous post in this list indicated that this should > > be possible, but I must be missing something. This can be accomplished with eutils from NCBI pretty easily. If you have a GI number (without the 'GI:') like: 47078294 (which corresponds to refseq NM_000022, just for example) You can use eLink to get the reference to the Entrez Gene database, if you like, by doing: readLines(url('http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?db from=nucleotide&db=gene&id=47078294')) This will return XML and the <id>100</id> tag is the Gene ID of that GI number. I show here just the readLines output, but you could also use the XML package to do the parsing of the output if you liked. If you loop over your GI numbers, you can retrieve them all. Be sure to leave a little time between queries so that you don't set off any alarms at NCBI about too many queries in too little time. Hope that helps. Sean
ADD REPLY
0
Entering edit mode
Sorry I've come in a bit late on this topic .. Elink is a nice choice, you can also get the tab delimited flat file of the IPI cross-reference database at: ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ 1# Database from which master entry of this IPI entry has been taken. One of either SP (UniProtKB/Swiss-Prot), TR (UniProtKB/TrEMBL), ENSEMBL (Ensembl), ENSEMBL_HAVANA (Ensembl Havana subset), REFSEQ_STATUS (where STATUS corresponds to the RefSeq entry revision status), VEGA (Vega), TAIR (TAIR Protein data set) or HINV (H-Invitational Database). 2# UniProtKB accession number or Vega ID or Ensembl ID or RefSeq ID or TAIR Protein ID or H-InvDB ID. 3# International Protein Index identifier. 4# Supplementary UniProtKB/Swiss-Prot entries associated with this IPI entry. 5# Supplementary UniProtKB/TrEMBL entries associated with this IPI entry. 6# Supplementary Ensembl entries associated with this IPI entry. Havana curated transcripts preceeded by the key HAVANA: (e.g. HAVANA:ENSP00000237305;ENSP00000356824;). 7# Supplementary list of RefSeq STATUS:ID couples (separated by a semi-colon ';') associated with this IPI entry (RefSeq entry revision status details). 8# Supplementary TAIR Protein entries associated with this IPI entry. 9# Supplementary H-Inv Protein entries associated with this IPI entry. 10# Protein identifiers (cross reference to EMBL/Genbank/DDBJ nucleotide databases). 11# List of HGNC number, HGNC official gene symbol couples (separated by a semi-colon ';') associated with this IPI entry. 12# List of NCBI Entrez Gene gene number, Entrez Gene Default Gene Symbol couples (separated by a semi-colon ';') associated with this IPI entry. 13# UNIPARC identifier associated with the sequence of this IPI entry. 14# UniGene identifiers associated with this IPI entry. 15# CCDS identifiers associated with this IPI entry. 16# RefSeq GI protein identifiers associated with this IPI entry. 17# Supplementary Vega entries associated with this IPI entry. ... see http://www.ebi.ac.uk/IPI/xrefs.html Columns 3 an 7 would probably suite you and would be easy to read into R. Actually you should probably choose columns 3 and 7 when column 1 is REFSEQ_*. (note you can also get the mysql dump of this database which is even better if you know some SQL). There might be only a few missing (no REFSEQ) that you can get with elink as Sean suggests. Cheers Paul Leo -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Sean Davis Sent: Tuesday, 9 January 2007 1:48 AM To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] help with protein IPI annotation mappings On Monday 08 January 2007 10:22, Steffen Durinck wrote: > Hi Mark, > > I quickly scanned the attributes and filters and it looks like you > currently can not use genbank accession numbers with Ensembl. > To be sure you could ask the Ensembl helpdesk: helpdesk at ensembl.org if > genbank accession numbers are in their database and what the name of the > corresponding filter is. If they don't have genbank ids you could ask > them if there is a possibility to include genbank ids in future releases. > Whatever information Ensembl makes available is retrievable through the > biomaRt package and questions or suggestions related to the data > present in Ensembl can be best addressed to their helpdesk. Make sure > you let them know you are using the BioMart version of Ensembl. > > Cheers, > Steffen > > Kimpel, Mark William wrote: > > Steffen, > > > > Your code to convert IPI to entrezgene ID's worked like charm. Now I > > have run into another problem. I have discovered that some of the ID's I > > need to map are GenBank ID's of the form (GI:XXXX). I have used > > listAttributes(ensembl) and cannot figure out which, if any correspond > > to the NCBI GI. A previous post in this list indicated that this should > > be possible, but I must be missing something. This can be accomplished with eutils from NCBI pretty easily. If you have a GI number (without the 'GI:') like: 47078294 (which corresponds to refseq NM_000022, just for example) You can use eLink to get the reference to the Entrez Gene database, if you like, by doing: readLines(url('http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?db fr om=nucleotide&db=gene&id=47078294')) This will return XML and the <id>100</id> tag is the Gene ID of that GI number. I show here just the readLines output, but you could also use the XML package to do the parsing of the output if you liked. If you loop over your GI numbers, you can retrieve them all. Be sure to leave a little time between queries so that you don't set off any alarms at NCBI about too many queries in too little time. Hope that helps. Sean _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Sean and Paul, Thanks for your help, it will work. Mark Mark W. Kimpel MD (317) 490-5129 Work, & Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Paul Leo Sent: Monday, January 08, 2007 6:54 PM To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] help with protein IPI annotation mappings Sorry I've come in a bit late on this topic .. Elink is a nice choice, you can also get the tab delimited flat file of the IPI cross-reference database at: ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ 1# Database from which master entry of this IPI entry has been taken. One of either SP (UniProtKB/Swiss-Prot), TR (UniProtKB/TrEMBL), ENSEMBL (Ensembl), ENSEMBL_HAVANA (Ensembl Havana subset), REFSEQ_STATUS (where STATUS corresponds to the RefSeq entry revision status), VEGA (Vega), TAIR (TAIR Protein data set) or HINV (H-Invitational Database). 2# UniProtKB accession number or Vega ID or Ensembl ID or RefSeq ID or TAIR Protein ID or H-InvDB ID. 3# International Protein Index identifier. 4# Supplementary UniProtKB/Swiss-Prot entries associated with this IPI entry. 5# Supplementary UniProtKB/TrEMBL entries associated with this IPI entry. 6# Supplementary Ensembl entries associated with this IPI entry. Havana curated transcripts preceeded by the key HAVANA: (e.g. HAVANA:ENSP00000237305;ENSP00000356824;). 7# Supplementary list of RefSeq STATUS:ID couples (separated by a semi-colon ';') associated with this IPI entry (RefSeq entry revision status details). 8# Supplementary TAIR Protein entries associated with this IPI entry. 9# Supplementary H-Inv Protein entries associated with this IPI entry. 10# Protein identifiers (cross reference to EMBL/Genbank/DDBJ nucleotide databases). 11# List of HGNC number, HGNC official gene symbol couples (separated by a semi-colon ';') associated with this IPI entry. 12# List of NCBI Entrez Gene gene number, Entrez Gene Default Gene Symbol couples (separated by a semi-colon ';') associated with this IPI entry. 13# UNIPARC identifier associated with the sequence of this IPI entry. 14# UniGene identifiers associated with this IPI entry. 15# CCDS identifiers associated with this IPI entry. 16# RefSeq GI protein identifiers associated with this IPI entry. 17# Supplementary Vega entries associated with this IPI entry. ... see http://www.ebi.ac.uk/IPI/xrefs.html Columns 3 an 7 would probably suite you and would be easy to read into R. Actually you should probably choose columns 3 and 7 when column 1 is REFSEQ_*. (note you can also get the mysql dump of this database which is even better if you know some SQL). There might be only a few missing (no REFSEQ) that you can get with elink as Sean suggests. Cheers Paul Leo -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Sean Davis Sent: Tuesday, 9 January 2007 1:48 AM To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] help with protein IPI annotation mappings On Monday 08 January 2007 10:22, Steffen Durinck wrote: > Hi Mark, > > I quickly scanned the attributes and filters and it looks like you > currently can not use genbank accession numbers with Ensembl. > To be sure you could ask the Ensembl helpdesk: helpdesk at ensembl.org if > genbank accession numbers are in their database and what the name of the > corresponding filter is. If they don't have genbank ids you could ask > them if there is a possibility to include genbank ids in future releases. > Whatever information Ensembl makes available is retrievable through the > biomaRt package and questions or suggestions related to the data > present in Ensembl can be best addressed to their helpdesk. Make sure > you let them know you are using the BioMart version of Ensembl. > > Cheers, > Steffen > > Kimpel, Mark William wrote: > > Steffen, > > > > Your code to convert IPI to entrezgene ID's worked like charm. Now I > > have run into another problem. I have discovered that some of the ID's I > > need to map are GenBank ID's of the form (GI:XXXX). I have used > > listAttributes(ensembl) and cannot figure out which, if any correspond > > to the NCBI GI. A previous post in this list indicated that this should > > be possible, but I must be missing something. This can be accomplished with eutils from NCBI pretty easily. If you have a GI number (without the 'GI:') like: 47078294 (which corresponds to refseq NM_000022, just for example) You can use eLink to get the reference to the Entrez Gene database, if you like, by doing: readLines(url('http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?db fr om=nucleotide&db=gene&id=47078294')) This will return XML and the <id>100</id> tag is the Gene ID of that GI number. I show here just the readLines output, but you could also use the XML package to do the parsing of the output if you liked. If you loop over your GI numbers, you can retrieve them all. Be sure to leave a little time between queries so that you don't set off any alarms at NCBI about too many queries in too little time. Hope that helps. Sean _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6