Human, Mouse and Rat homologs
1
0
Entering edit mode
David Lyon ▴ 340
@david-lyon-4016
Last seen 3.1 years ago
United States
If I had a file containing a list of Human: 1)Refseq IDs: "probe_id" "accession" "1" "8039748" "NM_130786" "2" "8039748" "NP_570602" "3" "7960947" "NM_000014" "4" "7960947" "NP_000005" "5" "8144857" "NM_000662" "6" "8144857" "NM_001160170" Or 2)Ensemble genes: "probe_id" "ensembl_id" "1" "8039748" "ENSG00000121410" "2" "7960947" "ENSG00000175899" "3" "8144857" "ENSG00000171428" "4" "8144866" "ENSG00000156006" "5" "7976496" "ENSG00000196136" "6" "8083415" "ENSG00000114771" which R package does the conversion of the list of IDs to find the Mouse homologs and can someone type the exact command? Thank you for your consideration.
• 3.4k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Dear David one of the possible solutions is via the BioMart interface to the Ensembl databases. Please check the getLDS function in the biomaRt package, which is described in that package's vignette. Best wishes Wolfgang Lyon scripsit 20/05/10 04:54: > If I had a file containing a list of Human: > > 1)Refseq IDs: > > "probe_id" "accession" > "1" "8039748" "NM_130786" > "2" "8039748" "NP_570602" > "3" "7960947" "NM_000014" > "4" "7960947" "NP_000005" > "5" "8144857" "NM_000662" > "6" "8144857" "NM_001160170" > > Or > > 2)Ensemble genes: > > "probe_id" "ensembl_id" > "1" "8039748" "ENSG00000121410" > "2" "7960947" "ENSG00000175899" > "3" "8144857" "ENSG00000171428" > "4" "8144866" "ENSG00000156006" > "5" "7976496" "ENSG00000196136" > "6" "8083415" "ENSG00000114771" > > > which R package does the conversion of the list of IDs to find the Mouse homologs and can someone type the exact command? > > Thank you for your consideration. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD COMMENT
0
Entering edit mode
Lets say your data is in a data frame called "d", then the code might be: > d probe_id ensembl_id 1 8039748 ENSG00000121410 2 7960947 ENSG00000175899 3 8144857 ENSG00000171428 4 8144866 ENSG00000156006 5 7976496 ENSG00000196136 6 8083415 ENSG00000114771 > > library(biomaRt) > > mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl") > > h2m <- getBM(attributes=c("ensembl_gene_id","mouse_ensembl_gene"), mart=mart) > > my.h2m <- merge(d, h2m, by.x="ensembl_id", by.y="ensembl_gene_id", sort=FALSE) > my.h2m ensembl_id probe_id mouse_ensembl_gene 1 ENSG00000121410 8039748 ENSMUSG00000022347 2 ENSG00000175899 7960947 ENSMUSG00000030111 3 ENSG00000171428 8144857 ENSMUSG00000025588 4 ENSG00000171428 8144857 ENSMUSG00000051147 5 ENSG00000171428 8144857 ENSMUSG00000056426 6 ENSG00000156006 8144866 ENSMUSG00000051147 7 ENSG00000156006 8144866 ENSMUSG00000056426 8 ENSG00000156006 8144866 ENSMUSG00000025588 9 ENSG00000196136 7976496 ENSMUSG00000066363 10 ENSG00000196136 7976496 ENSMUSG00000041536 11 ENSG00000196136 7976496 ENSMUSG00000066364 12 ENSG00000196136 7976496 ENSMUSG00000058207 13 ENSG00000196136 7976496 ENSMUSG00000079012 14 ENSG00000196136 7976496 ENSMUSG00000079013 15 ENSG00000196136 7976496 ENSMUSG00000021091 16 ENSG00000196136 7976496 ENSMUSG00000066361 17 ENSG00000196136 7976496 ENSMUSG00000041449 18 ENSG00000196136 7976496 ENSMUSG00000041481 19 ENSG00000114771 8083415 ENSMUSG00000027761 ________________________________________ From: bioconductor-bounces@stat.math.ethz.ch [bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Wolfgang Huber [whuber@embl.de] Sent: 20 May 2010 19:40 To: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Human, Mouse and Rat homologs Dear David one of the possible solutions is via the BioMart interface to the Ensembl databases. Please check the getLDS function in the biomaRt package, which is described in that package's vignette. Best wishes Wolfgang Lyon scripsit 20/05/10 04:54: > If I had a file containing a list of Human: > > 1)Refseq IDs: > > "probe_id" "accession" > "1" "8039748" "NM_130786" > "2" "8039748" "NP_570602" > "3" "7960947" "NM_000014" > "4" "7960947" "NP_000005" > "5" "8144857" "NM_000662" > "6" "8144857" "NM_001160170" > > Or > > 2)Ensemble genes: > > "probe_id" "ensembl_id" > "1" "8039748" "ENSG00000121410" > "2" "7960947" "ENSG00000175899" > "3" "8144857" "ENSG00000171428" > "4" "8144866" "ENSG00000156006" > "5" "7976496" "ENSG00000196136" > "6" "8083415" "ENSG00000114771" > > > which R package does the conversion of the list of IDs to find the Mouse homologs and can someone type the exact command? > > Thank you for your consideration. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Michael and Wolfgang Thanks for your help, I think I can get what I need.I am new to R and therefore apologizes for the simple q's. A couple last questions on this thread if I may: if I wanted all the gene ids from ensembl it would be this: entrez =getBM("ensembl_gene_id", mart=human) > entrez but how can I get the following working and return all the gene_ids and transcript_ids? entrez =getBM("ensembl_gene_id","ensembl_transcript_id", mart=human) My final question is there a wild card '*' , essentially give me all the attributes returns for human ensembl eg: entrez =getBM(*, mart=human) Thanks again!
ADD REPLY
0
Entering edit mode
Hi David, To get all ensembl_gene and ensembl_transcript ids you would need to do: ids =getBM(c("ensembl_gene_id","ensembl_transcript_id"), mart=human) Your other question, to get all attributes is unreasonable. There are more than a hundred attributes for human and this would be a huge amount of data and I'm sure you don't need all of it. Also, biomaRt and the BioMart system won't allow you to query for all attributes at once. It is better to select the attributes you really need. If you really need everything you'd be better of downloading a mysql dump of Ensembl human from the Ensembl website. Cheers, Steffen On 5/20/10, David Lyon <david_lyon3 at="" yahoo.com=""> wrote: > Hi Michael and Wolfgang > > Thanks for your help, I think I can get what I need.I am new to R and > therefore apologizes for the simple q's. > > A couple last questions on this thread if I may: > > if I wanted all the gene ids from ensembl it would be this: > entrez =getBM("ensembl_gene_id", mart=human) >> entrez > > but how can I get the following working and return all the gene_ids and > transcript_ids? > entrez =getBM("ensembl_gene_id","ensembl_transcript_id", mart=human) > > > My final question is there a wild card '*' , essentially give me all the > attributes returns for human ensembl eg: > entrez =getBM(*, mart=human) > > > > Thanks again! > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi Steffen Thank you very much that works nicely and also thanks to everyone for their help and time. ----- Original Message ---- From: Steffen Durinck <sdurinck@lbl.gov> To: David Lyon <david_lyon3 at="" yahoo.com=""> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> Sent: Thu, May 20, 2010 6:56:04 PM Subject: Re: [BioC] Human, Mouse and Rat homologs Hi David, To get all ensembl_gene and ensembl_transcript ids you would need to do: ids =getBM(c("ensembl_gene_id","ensembl_transcript_id"), mart=human) Your other question, to get all attributes is unreasonable. There are more than a hundred attributes for human and this would be a huge amount of data and I'm sure you don't need all of it. Also, biomaRt and the BioMart system won't allow you to query for all attributes at once. It is better to select the attributes you really need. If you really need everything you'd be better of downloading a mysql dump of Ensembl human from the Ensembl website. Cheers, Steffen On 5/20/10, David Lyon <david_lyon3 at="" yahoo.com=""> wrote: > Hi Michael and Wolfgang > > Thanks for your help, I think I can get what I need.I am new to R and > therefore apologizes for the simple q's. > > A couple last questions on this thread if I may: > > if I wanted all the gene ids from ensembl it would be this: > entrez =getBM("ensembl_gene_id", mart=human) >> entrez > > but how can I get the following working and return all the gene_ids and > transcript_ids? > entrez =getBM("ensembl_gene_id","ensembl_transcript_id", mart=human) > > > My final question is there a wild card '*' , essentially give me all the > attributes returns for human ensembl eg: > entrez =getBM(*, mart=human) > > > > Thanks again! > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6