biomaRt manual

0

Entering edit mode

Weiwei Shi ★ 1.2k

@weiwei-shi-1407

Last seen 9.6 years ago

Hi, I have a simple question on using biomaRt but I did not find a proper manual to follow. I have a list of probes with affy ids and I want to convert them into entrezgene id. Is there a manual or example I can follow? Thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

affy convert biomaRt affy convert biomaRt • 1.6k views

ADD COMMENT • link updated 17.1 years ago by Kasper Daniel Hansen ★ 6.5k • written 17.1 years ago by Weiwei Shi ★ 1.2k

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 9 months ago

United States

As most Bioconductor packages, biomaRt comes with a vignette (actually 2). Read it :) To do so, you can do one of several steps Method a) R> vignette() will give you a list of vignettes on your system, alternatively you can try R> vignette(package = "biomaRt") Doing R> vignette("NAME") will open one. Method b) Use Biobase vignette browser: R> library(Biobase) R> openVignette() This differs from vignette in * Only vignettes from loaded packages are shown (so you need to do R> library(biomaRt) first) * You get a selection menu Method c) Download it from the package webpage on www.bioconductor.org Kasper On Mar 28, 2007, at 9:06 PM, Weiwei Shi wrote: > Hi, > I have a simple question on using biomaRt but I did not find a proper > manual to follow. > > I have a list of probes with affy ids and I want to convert them into > entrezgene id. Is there a manual or example I can follow? > > Thanks. > > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD COMMENT • link 17.1 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Thanks, Kasper. But I did use method B and found the vignettes. However, it seems that one is not written in very detailed. On 3/29/07, Kasper Daniel Hansen <khansen at="" stat.berkeley.edu=""> wrote: > As most Bioconductor packages, biomaRt comes with a vignette > (actually 2). Read it :) > > To do so, you can do one of several steps > > Method a) > R> vignette() > will give you a list of vignettes on your system, alternatively you > can try > R> vignette(package = "biomaRt") > Doing > R> vignette("NAME") > will open one. > > Method b) > Use Biobase vignette browser: > R> library(Biobase) > R> openVignette() > This differs from vignette in > * Only vignettes from loaded packages are shown (so you need to do R> > library(biomaRt) first) > * You get a selection menu > > Method c) > Download it from the package webpage on www.bioconductor.org > > Kasper > > On Mar 28, 2007, at 9:06 PM, Weiwei Shi wrote: > > > Hi, > > I have a simple question on using biomaRt but I did not find a proper > > manual to follow. > > > > I have a list of probes with affy ids and I want to convert them into > > entrezgene id. Is there a manual or example I can follow? > > > > Thanks. > > > > -- > > Weiwei Shi, Ph.D > > Research Scientist > > GeneGO, Inc. > > > > "Did you always know?" > > "No, I did not. But I believed..." > > ---Matrix III > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/ > > gmane.science.biology.informatics.conductor > > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

ADD REPLY • link 17.1 years ago Weiwei Shi ★ 1.2k

0

Entering edit mode

On Thursday 29 March 2007 06:14, Weiwei Shi wrote: > Thanks, Kasper. But I did use method B and found the vignettes. > However, it seems that one is not written in very detailed. Hi, Weiwei. I'm not sure to which vignette you are referring, or both, but you really have to read them, try the examples (cut-and-paste the code, if needed). If you do that with the vignette called "prettyOutput.pdf", I think you will find that it answers your original question exactly. If, after reading the vignettes and the help associated with all of those functions, you still have questions, feel free to post what you have tried (actual code pasted in from your session and any error messages) and the output of sessionInfo(). If you find the examples, documentation, or vignettes are confusing or incomplete, I can say with a high degree of certainty that all of the bioconductor authors appreciate contributions that would improve their documentation, so feel free to contribute. Hope that helps. Sean

ADD REPLY • link 17.1 years ago Sean Davis 21k

0

Entering edit mode

Sorry :) when I am composing the following email, I did not realize there are a couple of replies now. I read the manual carefully but I am still having some questions like this: For example, > getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", values=head(ids2), mart=human) affy_hg_u95a entrezgene 1 31308_at NA 2 31310_at 2741 3 31312_at 9312 > head(ids2) [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at" [6] "31312_at" > getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", values="31307_at", mart=human) NULL I am confused by "NULL" and "NA". I am wondering about the difference b/w them. Another question is how to make >8000 queries faster though I read some from previous posts. Thanks. Weiwei On 3/29/07, Weiwei Shi <helprhelp at="" gmail.com=""> wrote: > Thanks, Kasper. But I did use method B and found the vignettes. > However, it seems that one is not written in very detailed. > > > > On 3/29/07, Kasper Daniel Hansen <khansen at="" stat.berkeley.edu=""> wrote: > > As most Bioconductor packages, biomaRt comes with a vignette > > (actually 2). Read it :) > > > > To do so, you can do one of several steps > > > > Method a) > > R> vignette() > > will give you a list of vignettes on your system, alternatively you > > can try > > R> vignette(package = "biomaRt") > > Doing > > R> vignette("NAME") > > will open one. > > > > Method b) > > Use Biobase vignette browser: > > R> library(Biobase) > > R> openVignette() > > This differs from vignette in > > * Only vignettes from loaded packages are shown (so you need to do R> > > library(biomaRt) first) > > * You get a selection menu > > > > Method c) > > Download it from the package webpage on www.bioconductor.org > > > > Kasper > > > > On Mar 28, 2007, at 9:06 PM, Weiwei Shi wrote: > > > > > Hi, > > > I have a simple question on using biomaRt but I did not find a proper > > > manual to follow. > > > > > > I have a list of probes with affy ids and I want to convert them into > > > entrezgene id. Is there a manual or example I can follow? > > > > > > Thanks. > > > > > > -- > > > Weiwei Shi, Ph.D > > > Research Scientist > > > GeneGO, Inc. > > > > > > "Did you always know?" > > > "No, I did not. But I believed..." > > > ---Matrix III > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: http://news.gmane.org/ > > > gmane.science.biology.informatics.conductor > > > > > > > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

ADD REPLY • link 17.1 years ago Weiwei Shi ★ 1.2k

0

Entering edit mode

Hi Weiwei, Weiwei Shi wrote: > Sorry :) when I am composing the following email, I did not realize > there are a couple of replies now. I read the manual carefully but I > am still having some questions like this: > > For example, > > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", values=head(ids2), mart=human) > > affy_hg_u95a entrezgene > 1 31308_at NA > 2 31310_at 2741 > 3 31312_at 9312 > >>head(ids2) > > [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at" > [6] "31312_at" > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", values="31307_at", mart=human) > > NULL > > I am confused by "NULL" and "NA". I am wondering about the difference b/w them. Steffen Durinck will know better, but I believe NULL means that Ensembl doesn't think that probeset maps to anything (e.g., there is nothing available), and NA means that there is no Entrez Gene ID for that probeset. For instance, if you pull the Entrez Gene ID for 31307_at from the hgu95aENTREZID environment, it lists 9594, but if you search Entrez Gene for that ID it says it has been discontinued. > > > Another question is how to make >8000 queries faster though I read > some from previous posts. In my experience the MySQL interface is much faster for large numbers of queries. Best, Jim -- James W. MacDonald University of Michigan Affymetrix and cDNA Microarray Core 1500 E Medical Center Drive Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 17.1 years ago James W. MacDonald 65k

0

Entering edit mode

On Thursday 29 March 2007 07:28, James W. MacDonald wrote: > Hi Weiwei, > > Weiwei Shi wrote: > > Sorry :) when I am composing the following email, I did not realize > > there are a couple of replies now. I read the manual carefully but I > > am still having some questions like this: > > > > For example, > > > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", > >> values=head(ids2), mart=human) > > > > affy_hg_u95a entrezgene > > 1 31308_at NA > > 2 31310_at 2741 > > 3 31312_at 9312 > > > >>head(ids2) > > > > [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at" > > [6] "31312_at" > > > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", > >> values="31307_at", mart=human) > > > > NULL > > > > I am confused by "NULL" and "NA". I am wondering about the difference b/w > > them. > > Steffen Durinck will know better, but I believe NULL means that Ensembl > doesn't think that probeset maps to anything (e.g., there is nothing > available), and NA means that there is no Entrez Gene ID for that probeset. > > For instance, if you pull the Entrez Gene ID for 31307_at from the > hgu95aENTREZID environment, it lists 9594, but if you search Entrez Gene > for that ID it says it has been discontinued. > > > Another question is how to make >8000 queries faster though I read > > some from previous posts. Make sure that you really need to make 8000 queries. It is much faster to make one or a few large queries than to make many small ones. Sean

ADD REPLY • link 17.1 years ago Sean Davis 21k

0

Entering edit mode

Here is another question: > length(unique(ids2)) [1] 12558 > length(ids2) [1] 12558 > head(ids2) [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at" [6] "31312_at" > t1 <- getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", values=(ids2), mart=human) > dim(t1) [1] 26360 2 > t1[1:20,] affy_hg_u95a entrezgene 1 32864_at 6736 2 32864_at 6736 3 41214_at 6192 4 41214_at 6192 5 31534_at 7544 6 31534_at 7544 7 36367_at 83259 8 36367_at 83259 9 36367_at 83259 10 36367_at 83259 11 1199_at NA 12 35929_s_at 64591 13 35929_s_at 64591 14 35929_s_at NA Please look at line 12-14. Why are there so many duplications? Why is there some inconsistency between line12-14? Thanks for the previous prompt replies from every "hardworking" people. I am now at China and it should be about 6am at US. Cheers, Weiwei On 3/29/07, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > On Thursday 29 March 2007 07:28, James W. MacDonald wrote: > > Hi Weiwei, > > > > Weiwei Shi wrote: > > > Sorry :) when I am composing the following email, I did not realize > > > there are a couple of replies now. I read the manual carefully but I > > > am still having some questions like this: > > > > > > For example, > > > > > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", > > >> values=head(ids2), mart=human) > > > > > > affy_hg_u95a entrezgene > > > 1 31308_at NA > > > 2 31310_at 2741 > > > 3 31312_at 9312 > > > > > >>head(ids2) > > > > > > [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at" > > > [6] "31312_at" > > > > > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", > > >> values="31307_at", mart=human) > > > > > > NULL > > > > > > I am confused by "NULL" and "NA". I am wondering about the difference b/w > > > them. > > > > Steffen Durinck will know better, but I believe NULL means that Ensembl > > doesn't think that probeset maps to anything (e.g., there is nothing > > available), and NA means that there is no Entrez Gene ID for that probeset. > > > > For instance, if you pull the Entrez Gene ID for 31307_at from the > > hgu95aENTREZID environment, it lists 9594, but if you search Entrez Gene > > for that ID it says it has been discontinued. > > > > > Another question is how to make >8000 queries faster though I read > > > some from previous posts. > > Make sure that you really need to make 8000 queries. It is much faster to > make one or a few large queries than to make many small ones. > > Sean > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

ADD REPLY • link 17.1 years ago Weiwei Shi ★ 1.2k

0

Entering edit mode

Hi Weiwei, Weiwei Shi wrote: > Here is another question: > >> length(unique(ids2)) > > [1] 12558 > >> length(ids2) > > [1] 12558 > >> head(ids2) > > [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at" > [6] "31312_at" > >> t1 <- getBM(attributes=c("affy_hg_u95a", "entrezgene"), >> filters="affy_hg_u95a", values=(ids2), mart=human) >> dim(t1) > > [1] 26360 2 > >> t1[1:20,] > > affy_hg_u95a entrezgene > 1 32864_at 6736 > 2 32864_at 6736 > 3 41214_at 6192 > 4 41214_at 6192 > 5 31534_at 7544 > 6 31534_at 7544 > 7 36367_at 83259 > 8 36367_at 83259 > 9 36367_at 83259 > 10 36367_at 83259 > 11 1199_at NA > 12 35929_s_at 64591 > 13 35929_s_at 64591 > 14 35929_s_at NA > > Please look at line 12-14. > Why are there so many duplications? Why is there some inconsistency > between line12-14? Again, Steffen Durinck would know better why there are duplicates. I think he told me once but my memory doesn't work like it used to ;-D Anyway, if you use output = "list", you will get a list with unique ids: getBM(attributes=c("affy_hg_u95a", "entrezgene"),filters="affy_hg_u95a", values=(ids), mart=mart, output="list") $affy_hg_u95a $affy_hg_u95a$`31307_at` [1] NA $affy_hg_u95a$`31308_at` [1] "31308_at" $affy_hg_u95a$`31309_r_at` [1] NA $affy_hg_u95a$`31310_at` [1] "31310_at" $affy_hg_u95a$`31311_at` [1] NA $affy_hg_u95a$`31312_at` [1] "31312_at" $entrezgene $entrezgene$`31307_at` [1] NA $entrezgene$`31308_at` [1] NA $entrezgene$`31309_r_at` [1] NA $entrezgene$`31310_at` [1] 2741 $entrezgene$`31311_at` [1] NA $entrezgene$`31312_at` [1] 9312 Best, Jim > > Thanks for the previous prompt replies from every "hardworking" > people. I am now at China and it should be about 6am at US. > > Cheers, > > Weiwei > > > > On 3/29/07, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > >> On Thursday 29 March 2007 07:28, James W. MacDonald wrote: >> > Hi Weiwei, >> > >> > Weiwei Shi wrote: >> > > Sorry :) when I am composing the following email, I did not realize >> > > there are a couple of replies now. I read the manual carefully but I >> > > am still having some questions like this: >> > > >> > > For example, >> > > >> > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), >> filters="affy_hg_u95a", >> > >> values=head(ids2), mart=human) >> > > >> > > affy_hg_u95a entrezgene >> > > 1 31308_at NA >> > > 2 31310_at 2741 >> > > 3 31312_at 9312 >> > > >> > >>head(ids2) >> > > >> > > [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at" >> > > [6] "31312_at" >> > > >> > >>getBM(attributes=c("affy_hg_u95a", "entrezgene"), >> filters="affy_hg_u95a", >> > >> values="31307_at", mart=human) >> > > >> > > NULL >> > > >> > > I am confused by "NULL" and "NA". I am wondering about the >> difference b/w >> > > them. >> > >> > Steffen Durinck will know better, but I believe NULL means that Ensembl >> > doesn't think that probeset maps to anything (e.g., there is nothing >> > available), and NA means that there is no Entrez Gene ID for that >> probeset. >> > >> > For instance, if you pull the Entrez Gene ID for 31307_at from the >> > hgu95aENTREZID environment, it lists 9594, but if you search Entrez >> Gene >> > for that ID it says it has been discontinued. >> > >> > > Another question is how to make >8000 queries faster though I read >> > > some from previous posts. >> >> Make sure that you really need to make 8000 queries. It is much >> faster to >> make one or a few large queries than to make many small ones. >> >> Sean >> > > -- James W. MacDonald University of Michigan Affymetrix and cDNA Microarray Core 1500 E Medical Center Drive Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 17.1 years ago James W. MacDonald 65k

0

Entering edit mode

Hi Weiwei, There are duplicates because Ensembl maps everything to the transcript level. If you would add the ensembl_transcript_id to your query you would get a better understanding of this. For example: getBM(attributes=c("affy_hg_u95a", "entrezgene","ensembl_transcript_id"), filters="affy_hg_u95a", values="32864_at", mart=human) gives: affy_hg_u95a entrezgene ensembl_transcript_id 1 32864_at 6736 ENST00000383070 2 32864_at 6736 ENST00000327563 same entrezgene id with different transcript identifiers. Note that if you have questions on the content of the Ensembl database/webservice you can also contact them directly at helpdesk at ensembl.org. The biomaRt package only provides an interface between their webservices and R and as such we have little control on the data their webservice returns. Best, Steffen Weiwei Shi wrote: > Here is another question: > >> length(unique(ids2)) >> > [1] 12558 > >> length(ids2) >> > [1] 12558 > >> head(ids2) >> > [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at" > [6] "31312_at" > >> t1 <- getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", values=(ids2), mart=human) >> dim(t1) >> > [1] 26360 2 > >> t1[1:20,] >> > affy_hg_u95a entrezgene > 1 32864_at 6736 > 2 32864_at 6736 > 3 41214_at 6192 > 4 41214_at 6192 > 5 31534_at 7544 > 6 31534_at 7544 > 7 36367_at 83259 > 8 36367_at 83259 > 9 36367_at 83259 > 10 36367_at 83259 > 11 1199_at NA > 12 35929_s_at 64591 > 13 35929_s_at 64591 > 14 35929_s_at NA > > Please look at line 12-14. > Why are there so many duplications? Why is there some inconsistency > between line12-14? > > Thanks for the previous prompt replies from every "hardworking" > people. I am now at China and it should be about 6am at US. > > Cheers, > > Weiwei > > > > On 3/29/07, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > >> On Thursday 29 March 2007 07:28, James W. MacDonald wrote: >> >>> Hi Weiwei, >>> >>> Weiwei Shi wrote: >>> >>>> Sorry :) when I am composing the following email, I did not realize >>>> there are a couple of replies now. I read the manual carefully but I >>>> am still having some questions like this: >>>> >>>> For example, >>>> >>>> >>>>> getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", >>>>> values=head(ids2), mart=human) >>>>> >>>> affy_hg_u95a entrezgene >>>> 1 31308_at NA >>>> 2 31310_at 2741 >>>> 3 31312_at 9312 >>>> >>>> >>>>> head(ids2) >>>>> >>>> [1] "31307_at" "31308_at" "31309_r_at" "31310_at" "31311_at" >>>> [6] "31312_at" >>>> >>>> >>>>> getBM(attributes=c("affy_hg_u95a", "entrezgene"), filters="affy_hg_u95a", >>>>> values="31307_at", mart=human) >>>>> >>>> NULL >>>> >>>> I am confused by "NULL" and "NA". I am wondering about the difference b/w >>>> them. >>>> >>> Steffen Durinck will know better, but I believe NULL means that Ensembl >>> doesn't think that probeset maps to anything (e.g., there is nothing >>> available), and NA means that there is no Entrez Gene ID for that probeset. >>> >>> For instance, if you pull the Entrez Gene ID for 31307_at from the >>> hgu95aENTREZID environment, it lists 9594, but if you search Entrez Gene >>> for that ID it says it has been discontinued. >>> >>> >>>> Another question is how to make >8000 queries faster though I read >>>> some from previous posts. >>>> >> Make sure that you really need to make 8000 queries. It is much faster to >> make one or a few large queries than to make many small ones. >> >> Sean >> >> > > > -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877

ADD REPLY • link 17.1 years ago Steffen Durinck ▴ 580

0

Entering edit mode

Weiwei Shi wrote: > Thanks, Kasper. But I did use method B and found the vignettes. > However, it seems that one is not written in very detailed. I would propose method d) then: Have a look at the source-code to get a clue what functions/methods are available. This is free open source code, documentation is sometimes incomplete, unfortunately. You could also search the web for tutorials involving biomaRt. -- Bye, Marc Saric

ADD REPLY • link 17.1 years ago Marc Saric ▴ 70

Login before adding your answer.