How map probeset_id to gene_symbols or other annotation information?

0

Entering edit mode

Peng Yu ▴ 940

@peng-yu-3586

Last seen 9.6 years ago

Hi, I have run the following 'run.R' script, which generated the file 'gene.txt'. My question is how to map the probeset_ids to the gene names or other information that is available in http://www.affymetrix.com/analysis/downloads/na29/wtgene/MoGene- 1_0-st-v1.na29.mm9.probeset.csv.zip? What package I should use to read the '.csv' file? Regards, Peng $cat run.R library(oligo) data<-read.celfiles(list.celfiles()) eset<-rma(data) write.exprs(eset, file="genes.txt", sep="\t") $ head genes.txt koA-mth_HZ_5238_MST1_19389.cel koB-mth_HZ_5238_MST1_19390.cel koC-mth_HZ_5238_MST1_19391.cel koD-mth_HZ_5238_MST1_19392.cel wt1-mth_HZ_5238_MST1_19385.cel wt2-mth_HZ_5238_MST1_19386.cel wt3-mth_HZ_5238_MST1_19387.cel wt4-mth_HZ_5238_MST1_19388.cel 10344615 7.07210987006919 7.01089258722033 7.26426270000726.92980486555595 7.72857978063884 6.91124431275741 7.457761829613277.21025349865986 10344617 3.02519545040591 3.08697023169755 3.032032340858283.09846420636071 3.12487891156704 3.10727683101607 3.0544609560487 3.03353963677405 10344619 3.20294677833793 3.20612630466463 3.176553031536723.13210443165341 3.1378507207366 3.21452663497659 3.313450502242243.09287042099817 10344621 4.70984671316916 4.68863215464979 4.437058573307564.59970839525133 4.66911715996711 4.80422412543456 4.57334787499862 4.60736276830484 10344623 7.79927399492793 7.78057650451938 7.727104168704187.68525205462879 7.66271776323834 7.65761154201622 7.67860029345257 7.80684426781102 10344625 8.43869623252839 9.23986002214653 9.014821817262028.8450593076064 8.59194370149885 9.08344656110017 9.074688130046138.92291936928794 10344626 10.0590964382247 9.75778614016683 9.668744583401899.91560261746937 9.97497585580347 9.90593250683953 9.72513220186519 10.0570156812405 10344627 7.45353674141328 7.85528510695415 7.12399388341447.48673272391552 8.2401362665769 7.24092300626232 7.4348487408975 7.8999935331867 10344628 10.1181530678991 10.2050144957479 10.082132643217510.2014962484731 10.3549307008668 9.97359523972773 9.82152593658235 10.0714458425003

• 1.5k views

ADD COMMENT • link updated 14.7 years ago by Sean Davis 21k • written 14.7 years ago by Peng Yu ▴ 940

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 3 months ago

United States

On Sat, Aug 8, 2009 at 6:31 PM, Peng Yu <pengyu.ut@gmail.com> wrote: > Hi, > > I have run the following 'run.R' script, which generated the file > 'gene.txt'. My question is how to map the probeset_ids to the gene > names or other information that is available in > > http://www.affymetrix.com/analysis/downloads/na29/wtgene/MoGene- 1_0-st-v1.na29.mm9.probeset.csv.zip > ? > What package I should use to read the '.csv' file? Hi, Peng. You will probably want to do some reading before posting. There is an entire manual on input/output with R available from CRAN that you should read. http://watson.nci.nih.gov/cran_mirror/manuals.html Let us know if you have questions about reading .csv files after you have read that manual. As a recommended alternative, have a look at the package: http://bioconductor.org/packages/2.4/data/annotation/html/mogene10stpr obeset.db.html Again, a simple search of the bioconductor metadata packages would have turned this up. Sean [[alternative HTML version deleted]]

ADD COMMENT • link 14.7 years ago Sean Davis 21k

0

Entering edit mode

On Sun, Aug 9, 2009 at 7:01 AM, Sean Davis<seandavi at="" gmail.com=""> wrote: > > > On Sat, Aug 8, 2009 at 6:31 PM, Peng Yu <pengyu.ut at="" gmail.com=""> wrote: >> >> Hi, >> >> I have run the following 'run.R' script, which generated the file >> 'gene.txt'. My question is how to map the probeset_ids to the gene >> names or other information that is available in >> >> http://www.affymetrix.com/analysis/downloads/na29/wtgene/MoGene- 1_0-st-v1.na29.mm9.probeset.csv.zip? >> What package I should use to read the '.csv' file? > > Hi, Peng.? You will probably want to do some reading before posting.? There > is an entire manual on input/output with R available from CRAN that you > should read. > > http://watson.nci.nih.gov/cran_mirror/manuals.html I don't have time to read all the manual. Which manual should I in particular focus on? "R Data Import/Export"? Would you please point to me which section is the most important for my task? > Let us know if you have questions about reading .csv files after you have > read that manual. > > As a recommended alternative, have a look at the package: > > http://bioconductor.org/packages/2.4/data/annotation/html/mogene10st probeset.db.html This package only has a reference, which is hard for a newbie to figure out how to use it. I will appreciate you if you could provide some examples on how to annotate my data. Regards, Peng

ADD REPLY • link 14.7 years ago Peng Yu ▴ 940

0

Entering edit mode

On Sun, Aug 9, 2009 at 11:53 AM, Peng Yu <pengyu.ut@gmail.com> wrote: > On Sun, Aug 9, 2009 at 7:01 AM, Sean Davis<seandavi@gmail.com> wrote: > > > > > > On Sat, Aug 8, 2009 at 6:31 PM, Peng Yu <pengyu.ut@gmail.com> wrote: > >> > >> Hi, > >> > >> I have run the following 'run.R' script, which generated the file > >> 'gene.txt'. My question is how to map the probeset_ids to the gene > >> names or other information that is available in > >> > >> > http://www.affymetrix.com/analysis/downloads/na29/wtgene/MoGene- 1_0-st-v1.na29.mm9.probeset.csv.zip > ? > >> What package I should use to read the '.csv' file? > > > > Hi, Peng. You will probably want to do some reading before posting. > There > > is an entire manual on input/output with R available from CRAN that you > > should read. > > > > http://watson.nci.nih.gov/cran_mirror/manuals.html > > I don't have time to read all the manual. Which manual should I in > particular focus on? "R Data Import/Export"? Would you please point to > me which section is the most important for my task? Hi, Peng. I don't mean to sound rude, but everyone on this list is quite busy. You will need to make time to do some of your own research, unfortunately. As an exercise and an answer to your question, check out the Table of Contents of the R Data Import/Export. If there is still a question about what section is most appropriate, feel free to post back to the list the code you have tried, any error messages, and the output of sessionInfo(). And, yes, you will benefit from at least skimming the entire manual--you will learn quite a bit. > > > Let us know if you have questions about reading .csv files after you have > > read that manual. > > > > As a recommended alternative, have a look at the package: > > > > > http://bioconductor.org/packages/2.4/data/annotation/html/mogene10st probeset.db.html > > This package only has a reference, which is hard for a newbie to > figure out how to use it. I will appreciate you if you could provide > some examples on how to annotate my data. > Good point. The annotation packages are generally based on information given here: http://bioconductor.org/packages/release/bioc/html/AnnotationDbi.html While the examples in the vignette use a different chip type, it should be possible to extrapolate to the mogene arrays. Sean [[alternative HTML version deleted]]

ADD REPLY • link 14.7 years ago Sean Davis 21k

0

Entering edit mode

On Sun, Aug 9, 2009 at 12:03 PM, Sean Davis<seandavi at="" gmail.com=""> wrote: > > > On Sun, Aug 9, 2009 at 11:53 AM, Peng Yu <pengyu.ut at="" gmail.com=""> wrote: >> >> On Sun, Aug 9, 2009 at 7:01 AM, Sean Davis<seandavi at="" gmail.com=""> wrote: >> > >> > >> > On Sat, Aug 8, 2009 at 6:31 PM, Peng Yu <pengyu.ut at="" gmail.com=""> wrote: >> >> >> >> Hi, >> >> >> >> I have run the following 'run.R' script, which generated the file >> >> 'gene.txt'. My question is how to map the probeset_ids to the gene >> >> names or other information that is available in >> >> >> >> >> >> http://www.affymetrix.com/analysis/downloads/na29/wtgene/MoGene- 1_0-st-v1.na29.mm9.probeset.csv.zip? >> >> What package I should use to read the '.csv' file? >> > >> > Hi, Peng.? You will probably want to do some reading before posting. >> > There >> > is an entire manual on input/output with R available from CRAN that you >> > should read. >> > >> > http://watson.nci.nih.gov/cran_mirror/manuals.html >> >> I don't have time to read all the manual. Which manual should I in >> particular focus on? "R Data Import/Export"? Would you please point to >> me which section is the most important for my task? > > Hi, Peng. > > I don't mean to sound rude, but everyone on this list is quite busy.? You > will need to make time to do some of your own research, unfortunately.? As > an exercise and an answer to your question, check out the Table of Contents > of the R Data Import/Export.? If there is still a question about what > section is most appropriate, feel free to post back to the list the code you > have tried, any error messages, and the output of sessionInfo().? And, yes, > you will benefit from at least skimming the entire manual--you will learn > quite a bit. Hi Sean, I have been skimming the manual. One thing I am not sure is that whether I should spend a few days on learning all the materials you mentioned, while I could use some other language that I am more familiar with and solve the problem quickly. I would like to solve my question today if possible. However, I completely understand that I should read all the manuals that you mentioned in the long run. I have thought of using perl to solve my problem. But I think that it is still better to figure out a way to do so in R as well. The code in perl would not be long, so I think the code in R would not be long, either. It doesn't seem that it would take an experienced R user a long time to figure out the R commands to map all the probeset_id to gene names or ensembl ids, does it? I know that I could use read.csv("MoGene-1_0-st-v1.na29.mm9.probeset.csv") to read the file, which gives a data frame. But how to extract the useful columns from the data frame? How to construct a mapping between the entry in one column to the entry in another column? I should use read.table("genes.txt") to read "genes.txt", right? How to replace its first column with the appropriate gene names or emsembl id using the mapping? It seems that MoGene-1_0-st-v1.na29.mm9.probeset.csv should have enough annotation information for my problem. Why do I need "mogene10stprobeset.db"? Regards, Peng

ADD REPLY • link 14.7 years ago Peng Yu ▴ 940

0

Entering edit mode

On Aug 9, 2009, at 13:06 , Peng Yu wrote: > On Sun, Aug 9, 2009 at 12:03 PM, Sean Davis<seandavi at="" gmail.com=""> wrote: >> >> Hi, Peng. >> >> I don't mean to sound rude, but everyone on this list is quite >> busy. You >> will need to make time to do some of your own research, >> unfortunately. As >> an exercise and an answer to your question, check out the Table of >> Contents >> of the R Data Import/Export. If there is still a question about what >> section is most appropriate, feel free to post back to the list the >> code you >> have tried, any error messages, and the output of sessionInfo(). >> And, yes, >> you will benefit from at least skimming the entire manual--you will >> learn >> quite a bit. > > Hi Sean, > > I have been skimming the manual. One thing I am not sure is that > whether I should spend a few days on learning all the materials you > mentioned, while I could use some other language that I am more > familiar with and solve the problem quickly. I would like to solve my > question today if possible. However, I completely understand that I > should read all the manuals that you mentioned in the long run. > > I have thought of using perl to solve my problem. But I think that it > is still better to figure out a way to do so in R as well. The code in > perl would not be long, so I think the code in R would not be long, > either. It doesn't seem that it would take an experienced R user a > long time to figure out the R commands to map all the probeset_id to > gene names or ensembl ids, does it? > > I know that I could use > read.csv("MoGene-1_0-st-v1.na29.mm9.probeset.csv") to read the file, > which gives a data frame. But how to extract the useful columns from > the data frame? How to construct a mapping between the entry in one > column to the entry in another column? I should use > read.table("genes.txt") to read "genes.txt", right? How to replace its > first column with the appropriate gene names or emsembl id using the > mapping? > > It seems that MoGene-1_0-st-v1.na29.mm9.probeset.csv should have > enough annotation information for my problem. Why do I need > "mogene10stprobeset.db"? Peng, Let me quote Wolfgang Huber: "the purpose of this mailing list is not for other people to do your homework for you". I don't think anyone are very inclined to help you, if you don't spend some time yourself reading about the language. Some of the questions you ask above are stuff you ought to know after spending 10 minutes with "An introduction to R". I believe in using the right tools for the job, and if you think you can do your stuff in a few hours using Perl, I think you should use Perl. If you want access to some of the powers and time saving features of R, you need to devote some time to learning it. But you cannot expect to do even simple stuff in a new language without spending some initial time on it. Kasper

ADD REPLY • link 14.7 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

On Sun, Aug 9, 2009 at 4:46 PM, Kasper Daniel Hansen<khansen at="" stat.berkeley.edu=""> wrote: > > On Aug 9, 2009, at 13:06 , Peng Yu wrote: > >> On Sun, Aug 9, 2009 at 12:03 PM, Sean Davis<seandavi at="" gmail.com=""> wrote: >>> >>> Hi, Peng. >>> >>> I don't mean to sound rude, but everyone on this list is quite busy. ?You >>> will need to make time to do some of your own research, unfortunately. >>> ?As >>> an exercise and an answer to your question, check out the Table of >>> Contents >>> of the R Data Import/Export. ?If there is still a question about what >>> section is most appropriate, feel free to post back to the list the code >>> you >>> have tried, any error messages, and the output of sessionInfo(). ?And, >>> yes, >>> you will benefit from at least skimming the entire manual--you will learn >>> quite a bit. >> >> Hi Sean, >> >> I have been skimming the manual. One thing I am not sure is that >> whether I should spend a few days on learning all the materials you >> mentioned, while I could use some other language that I am more >> familiar with and solve the problem quickly. I would like to solve my >> question today if possible. However, I completely understand that I >> should read all the manuals that you mentioned in the long run. >> >> I have thought of using perl to solve my problem. But I think that it >> is still better to figure out a way to do so in R as well. The code in >> perl would not be long, so I think the code in R would not be long, >> either. It doesn't seem that it would take an experienced R user a >> long time to figure out the R commands to map all the probeset_id to >> gene names or ensembl ids, does it? >> >> I know that I could use >> read.csv("MoGene-1_0-st-v1.na29.mm9.probeset.csv") to read the file, >> which gives a data frame. But how to extract the useful columns from >> the data frame? How to construct a mapping between the entry in one >> column to the entry in another column? I should use >> read.table("genes.txt") to read "genes.txt", right? How to replace its >> first column with the appropriate gene names or emsembl id using the >> mapping? >> >> It seems that MoGene-1_0-st-v1.na29.mm9.probeset.csv should have >> enough annotation information for my problem. Why do I need >> "mogene10stprobeset.db"? > > Peng, > > Let me quote Wolfgang Huber: "the purpose of this mailing list is not for > other people to do your homework for you". ?I don't think anyone are very > inclined to help you, if you don't spend some time yourself reading about > the language. ?Some of the questions you ask above are stuff you ought to > know after spending 10 minutes with "An introduction to R". > > I believe in using the right tools for the job, and if you think you can do > your stuff in a few hours using Perl, I think you should use Perl. ?If you > want access to some of the powers and time saving features of R, you need to > devote some time to learning it. ?But you cannot expect to do even simple > stuff in a new language without spending some initial time on it. Hi Kasper I don't think that I want somebody to do the homework for me. One thing that I feel frustrated about reading R documentation is that the useful information is often scattered in different places, which is not easy for a new user to piece them together. One example is mogene10stprobeset.db, whose document doesn't mention AnnotationDbi. I feel that learning from example complementing with reading R documentation is a more efficient way. BTW, Do you know why "mogene10stprobeset.db" is needed if I have MoGene-1_0-st-v1.na29.mm9.probeset.csv already? Regards, Peng

ADD REPLY • link 14.7 years ago Peng Yu ▴ 940

0

Entering edit mode

On Sun, Aug 9, 2009 at 6:48 PM, Peng Yu <pengyu.ut@gmail.com> wrote: > On Sun, Aug 9, 2009 at 4:46 PM, Kasper Daniel > Hansen<khansen@stat.berkeley.edu> wrote: > > > > On Aug 9, 2009, at 13:06 , Peng Yu wrote: > > > >> On Sun, Aug 9, 2009 at 12:03 PM, Sean Davis<seandavi@gmail.com> wrote: > >>> > >>> Hi, Peng. > >>> > >>> I don't mean to sound rude, but everyone on this list is quite busy. > You > >>> will need to make time to do some of your own research, unfortunately. > >>> As > >>> an exercise and an answer to your question, check out the Table of > >>> Contents > >>> of the R Data Import/Export. If there is still a question about what > >>> section is most appropriate, feel free to post back to the list the > code > >>> you > >>> have tried, any error messages, and the output of sessionInfo(). And, > >>> yes, > >>> you will benefit from at least skimming the entire manual--you will > learn > >>> quite a bit. > >> > >> Hi Sean, > >> > >> I have been skimming the manual. One thing I am not sure is that > >> whether I should spend a few days on learning all the materials you > >> mentioned, while I could use some other language that I am more > >> familiar with and solve the problem quickly. I would like to solve my > >> question today if possible. However, I completely understand that I > >> should read all the manuals that you mentioned in the long run. > >> > >> I have thought of using perl to solve my problem. But I think that it > >> is still better to figure out a way to do so in R as well. The code in > >> perl would not be long, so I think the code in R would not be long, > >> either. It doesn't seem that it would take an experienced R user a > >> long time to figure out the R commands to map all the probeset_id to > >> gene names or ensembl ids, does it? > >> > >> I know that I could use > >> read.csv("MoGene-1_0-st-v1.na29.mm9.probeset.csv") to read the file, > >> which gives a data frame. But how to extract the useful columns from > >> the data frame? How to construct a mapping between the entry in one > >> column to the entry in another column? I should use > >> read.table("genes.txt") to read "genes.txt", right? How to replace its > >> first column with the appropriate gene names or emsembl id using the > >> mapping? > >> > >> It seems that MoGene-1_0-st-v1.na29.mm9.probeset.csv should have > >> enough annotation information for my problem. Why do I need > >> "mogene10stprobeset.db"? > > > > Peng, > > > > Let me quote Wolfgang Huber: "the purpose of this mailing list is not for > > other people to do your homework for you". I don't think anyone are very > > inclined to help you, if you don't spend some time yourself reading about > > the language. Some of the questions you ask above are stuff you ought to > > know after spending 10 minutes with "An introduction to R". > > > > I believe in using the right tools for the job, and if you think you can > do > > your stuff in a few hours using Perl, I think you should use Perl. If > you > > want access to some of the powers and time saving features of R, you need > to > > devote some time to learning it. But you cannot expect to do even simple > > stuff in a new language without spending some initial time on it. > > Hi Kasper > > I don't think that I want somebody to do the homework for me. One > thing that I feel frustrated about reading R documentation is that the > useful information is often scattered in different places, which is > not easy for a new user to piece them together. One example is > mogene10stprobeset.db, whose document doesn't mention AnnotationDbi. I > feel that learning from example complementing with reading R > documentation is a more efficient way. > We can agree that documentation can always be improved. That said, the documentation that exists is generally good and worthy of attention. If there are shortcomings in the documentation, please let us know, as we can try to improve the project based on constructive feedback. > > BTW, Do you know why "mogene10stprobeset.db" is needed if I have > MoGene-1_0-st-v1.na29.mm9.probeset.csv already? > The .csv file has most of the information, yes. However, as the AnnotationDbi vignette shows, there are some advantages to using the .db annotation packages. In particular, the .db packages have not only data but methods for dealing with the data rather efficiently and quickly and the data are both self-describing and standardized between annotation packages. Sean [[alternative HTML version deleted]]

ADD REPLY • link 14.7 years ago Sean Davis 21k

0

Entering edit mode

Hi Peng, There is in fact a lot of documentation inside of each package if you know how to look for it. One form is in the form of manual pages which can be listed like this example: ls("package:mogene10stprobeset.db") And then you can read the manual pages by typing ? followed by the name of the object you want to know about like this example: ?mogene10stprobesetENTREZID Finally, almost every bioconductor package has some sort vignette that is associated with it. In the case of the annotation packages, there are three vignettes loaded with AnnotationDbi (which will always be loaded before any annotation package, so they will always be there if you look). You can load a vignette by using the openVignette() command like this: openVignette() And then just pick the number for the vignette that you would like to read. Reading the vignette will give a much more comprehensive overview of the purpose of the package with even more examples than the manual pages. Both of these resources are critical if you want to be able to use R. I would recommend that you look at these in addition to reading that R user manual that was mentioned before. With respect to the annotation packages, they are not simply a repeat of what is in the csv files from Affymetrix. In fact, we don't actually even know where Affymetrix gets the data in those files from, nor do we use most of that data in those files in building the annotation packages. Instead we go direct to the source whenever possible and get most of our information from places like NCBI, the EBI etc. The only information that we get from Affymetrix is the basic probe to gene mapping data (in the form of probe to entrez gene, genbank accession etc.) which we then map onto the information from primary sources such as NCBI etc. in order to tie the other data to the probes. You are free of course to use whichever information source you prefer, but please be advised that they are probably not equivalent. Marc Peng Yu wrote: > On Sun, Aug 9, 2009 at 4:46 PM, Kasper Daniel > Hansen<khansen at="" stat.berkeley.edu=""> wrote: > >> On Aug 9, 2009, at 13:06 , Peng Yu wrote: >> >> >>> On Sun, Aug 9, 2009 at 12:03 PM, Sean Davis<seandavi at="" gmail.com=""> wrote: >>> >>>> Hi, Peng. >>>> >>>> I don't mean to sound rude, but everyone on this list is quite busy. You >>>> will need to make time to do some of your own research, unfortunately. >>>> As >>>> an exercise and an answer to your question, check out the Table of >>>> Contents >>>> of the R Data Import/Export. If there is still a question about what >>>> section is most appropriate, feel free to post back to the list the code >>>> you >>>> have tried, any error messages, and the output of sessionInfo(). And, >>>> yes, >>>> you will benefit from at least skimming the entire manual--you will learn >>>> quite a bit. >>>> >>> Hi Sean, >>> >>> I have been skimming the manual. One thing I am not sure is that >>> whether I should spend a few days on learning all the materials you >>> mentioned, while I could use some other language that I am more >>> familiar with and solve the problem quickly. I would like to solve my >>> question today if possible. However, I completely understand that I >>> should read all the manuals that you mentioned in the long run. >>> >>> I have thought of using perl to solve my problem. But I think that it >>> is still better to figure out a way to do so in R as well. The code in >>> perl would not be long, so I think the code in R would not be long, >>> either. It doesn't seem that it would take an experienced R user a >>> long time to figure out the R commands to map all the probeset_id to >>> gene names or ensembl ids, does it? >>> >>> I know that I could use >>> read.csv("MoGene-1_0-st-v1.na29.mm9.probeset.csv") to read the file, >>> which gives a data frame. But how to extract the useful columns from >>> the data frame? How to construct a mapping between the entry in one >>> column to the entry in another column? I should use >>> read.table("genes.txt") to read "genes.txt", right? How to replace its >>> first column with the appropriate gene names or emsembl id using the >>> mapping? >>> >>> It seems that MoGene-1_0-st-v1.na29.mm9.probeset.csv should have >>> enough annotation information for my problem. Why do I need >>> "mogene10stprobeset.db"? >>> >> Peng, >> >> Let me quote Wolfgang Huber: "the purpose of this mailing list is not for >> other people to do your homework for you". I don't think anyone are very >> inclined to help you, if you don't spend some time yourself reading about >> the language. Some of the questions you ask above are stuff you ought to >> know after spending 10 minutes with "An introduction to R". >> >> I believe in using the right tools for the job, and if you think you can do >> your stuff in a few hours using Perl, I think you should use Perl. If you >> want access to some of the powers and time saving features of R, you need to >> devote some time to learning it. But you cannot expect to do even simple >> stuff in a new language without spending some initial time on it. >> > > Hi Kasper > > I don't think that I want somebody to do the homework for me. One > thing that I feel frustrated about reading R documentation is that the > useful information is often scattered in different places, which is > not easy for a new user to piece them together. One example is > mogene10stprobeset.db, whose document doesn't mention AnnotationDbi. I > feel that learning from example complementing with reading R > documentation is a more efficient way. > > BTW, Do you know why "mogene10stprobeset.db" is needed if I have > MoGene-1_0-st-v1.na29.mm9.probeset.csv already? > > Regards, > Peng > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 14.7 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

On Mon, Aug 10, 2009 at 11:52 AM, Marc Carlson<mcarlson at="" fhcrc.org=""> wrote: > Hi Peng, > > There is in fact a lot of documentation inside of each package if you > know how to look for it. ?One form is in the form of manual pages which > can be listed like this example: > > ls("package:mogene10stprobeset.db") > > And then you can read the manual pages by typing ? followed by the name > of the object you want to know about like this example: > > ?mogene10stprobesetENTREZID > > Finally, almost every bioconductor package has some sort vignette that > is associated with it. ?In the case of the annotation packages, there > are three vignettes loaded with AnnotationDbi (which will always be > loaded before any annotation package, so they will always be there if > you look). ?You can load a vignette by using the openVignette() command > like this: > > openVignette() > > And then just pick the number for the vignette that you would like to > read. ?Reading the vignette will give a much more comprehensive overview > of the purpose of the package with even more examples than the manual > pages. ?Both of these resources are critical if you want to be able to > use R. ?I would recommend that you look at these in addition to reading > that R user manual that was mentioned before. > > With respect to the annotation packages, they are not simply a repeat of > what is in the csv files from Affymetrix. ?In fact, we don't actually > even know where Affymetrix gets the data in those files from, nor do we > use most of that data in those files in building the annotation > packages. ?Instead we go direct to the source whenever possible and get > most of our information from places like NCBI, the EBI etc. ?The only > information that we get from Affymetrix is the basic probe to gene > mapping data (in the form of probe to entrez gene, genbank accession > etc.) which we then map onto the information from primary sources such > as NCBI etc. in order to tie the other data to the probes. ?You are free > of course to use whichever information source you prefer, but please be > advised that they are probably not equivalent. Hi Marc, I run the following example shown in ?mogene10stprobesetENTREZID. It doesn't provide very meaningful error message (at the end of this message). Do you what the problem might be? I also run the following code. But I don't quite understand what the word 'vignette' means. Especially, what does it mean in R? Is 'vignette' a package documentation? Another problem is how to wisely choose the most relevant vignette if it shows 10 vignette? > library(mogene10stprobeset.db) > openVignette() Please select a vignette: 1: AnnotationDbi - AnnotationDbi 2: AnnotationDbi - Creating probe packages 3: AnnotationDbi - SQLForge 4: Biobase - An introduction to Biobase and ExpressionSets 5: Biobase - Bioconductor Overview 6: Biobase - esApply Introduction 7: Biobase - Notes for eSet developers 8: Biobase - Notes for writing introductory 'how to' documents 9: Biobase - quick views of eSet instances 10: DBI - A Common Database Interface (DBI) Based on your last advice, most of the time, it is better to use the annotation package rather than the affymetrix csv files, right? Regards, Peng $ Rscript run.R > library(mogene10stprobeset.db) Loading required package: methods Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. Loading required package: DBI > x <- mogene10stprobesetENTREZID > # Get the probe identifiers that are mapped to an ENTREZ Gene ID > mapped_probes <- mappedkeys(x) > # Convert to a list > xx <- as.list(x[mapped_probes]) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: String or BLOB exceeded size limit) Calls: as.list ... dbGetQuery -> sqliteQuickSQL -> sqliteExecStatement -> .Call Execution halted

ADD REPLY • link 14.7 years ago Peng Yu ▴ 940

0

Entering edit mode

Hi Peng, It seems thatI have to apologize for giving you a poor example. The error you got here is from this particular package being an extreme case. I will find a way to patch that for the next release, but I seriously doubt that you will see it in anything other than the toy example provided in the manual page here. It is ultimately caused by the huge amount of data in the mogene10stprobeset database, so much data in fact, that we will have to change the way that we are querying the underlying database when we display the results. So thanks for reminding me that I need to patch this up. :) However, in practical usage, it is quite doubtful that you will ever run into this error unless you are in the habit of routinely looking up more than 150,000 keys at once, so you should not let this issue scare you away. You can make the following example run again by simply changing the number of mappedkeys sought to be something less than every single possible key at once. You can do it just like this: x <- mogene10stprobesetENTREZID # Get the probe identifiers that are mapped to an ENTREZ Gene ID mapped_probes <- mappedkeys(x)[1:150000] ##notice I am reducing the number of keys down to just the 1st 150,000 at the end of this step... # Convert to a list xx <- as.list(x[mapped_probes]) if(length(xx) > 0) { # Get the ENTREZID for the first five probes xx[1:5] # Get the first one xx[[1]] } To answer your other questions, a vignette is a document that will give an overview with examples of what a package is for. It differs from the manual pages which are more terse and usually used to indicate lower level infomation such as the arguments a method takes etc. If you want an even more generalized description about what some common bioconductor packages do you can look at our common workflows here: http://www.bioconductor.org/docs/workflows/index.html Most vignettes will be named in a way that clearly indicates what package they refer to, but you can always tell because their source code will always be included in the package source and their pdf files will always be on the website pages that correspond to the packages. You can see some examples of those in here: http://www.bioconductor.org/download/ Finally, I will always prefer for you to use the annotation packages, since that is why we provide them. We spend a lot of effort maintaining them and making sure that they are useful and updated twice a year. All the data in there is synchronized each release so you can safely cross compare things like GO terms with GO IDs that are associated with the probes you are using etc. And also, the packages are versioned and synchronized to go with a particular release of bioconductor, which should aid you in keeping your results reproducible. If you are using biocLite() then all of this should be "matched up" for you. Marc Peng Yu wrote: > On Mon, Aug 10, 2009 at 11:52 AM, Marc Carlson<mcarlson at="" fhcrc.org=""> wrote: > >> Hi Peng, >> >> There is in fact a lot of documentation inside of each package if you >> know how to look for it. One form is in the form of manual pages which >> can be listed like this example: >> >> ls("package:mogene10stprobeset.db") >> >> And then you can read the manual pages by typing ? followed by the name >> of the object you want to know about like this example: >> >> ?mogene10stprobesetENTREZID >> >> Finally, almost every bioconductor package has some sort vignette that >> is associated with it. In the case of the annotation packages, there >> are three vignettes loaded with AnnotationDbi (which will always be >> loaded before any annotation package, so they will always be there if >> you look). You can load a vignette by using the openVignette() command >> like this: >> >> openVignette() >> >> And then just pick the number for the vignette that you would like to >> read. Reading the vignette will give a much more comprehensive overview >> of the purpose of the package with even more examples than the manual >> pages. Both of these resources are critical if you want to be able to >> use R. I would recommend that you look at these in addition to reading >> that R user manual that was mentioned before. >> >> With respect to the annotation packages, they are not simply a repeat of >> what is in the csv files from Affymetrix. In fact, we don't actually >> even know where Affymetrix gets the data in those files from, nor do we >> use most of that data in those files in building the annotation >> packages. Instead we go direct to the source whenever possible and get >> most of our information from places like NCBI, the EBI etc. The only >> information that we get from Affymetrix is the basic probe to gene >> mapping data (in the form of probe to entrez gene, genbank accession >> etc.) which we then map onto the information from primary sources such >> as NCBI etc. in order to tie the other data to the probes. You are free >> of course to use whichever information source you prefer, but please be >> advised that they are probably not equivalent. >> > > Hi Marc, > > I run the following example shown in ?mogene10stprobesetENTREZID. It > doesn't provide very meaningful error message (at the end of this > message). Do you what the problem might be? > > I also run the following code. But I don't quite understand what the > word 'vignette' means. Especially, what does it mean in R? Is > 'vignette' a package documentation? Another problem is how to wisely > choose the most relevant vignette if it shows 10 vignette? > > >> library(mogene10stprobeset.db) >> openVignette() >> > Please select a vignette: > > 1: AnnotationDbi - AnnotationDbi > 2: AnnotationDbi - Creating probe packages > 3: AnnotationDbi - SQLForge > 4: Biobase - An introduction to Biobase and ExpressionSets > 5: Biobase - Bioconductor Overview > 6: Biobase - esApply Introduction > 7: Biobase - Notes for eSet developers > 8: Biobase - Notes for writing introductory 'how to' documents > 9: Biobase - quick views of eSet instances > 10: DBI - A Common Database Interface (DBI) > > Based on your last advice, most of the time, it is better to use the > annotation package rather than the affymetrix csv files, right? > > Regards, > Peng > > $ Rscript run.R > >> library(mogene10stprobeset.db) >> > Loading required package: methods > Loading required package: AnnotationDbi > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > Loading required package: DBI > >> x <- mogene10stprobesetENTREZID >> # Get the probe identifiers that are mapped to an ENTREZ Gene ID >> mapped_probes <- mappedkeys(x) >> # Convert to a list >> xx <- as.list(x[mapped_probes]) >> > Error in sqliteExecStatement(con, statement, bind.data) : > RS-DBI driver: (error in statement: String or BLOB exceeded size limit) > Calls: as.list ... dbGetQuery -> sqliteQuickSQL -> sqliteExecStatement -> .Call > Execution halted > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 14.7 years ago Marc Carlson ★ 7.2k

Login before adding your answer.