advantages of annotation packages
3
0
Entering edit mode
@rameswara-sashi-kiran-challa-5931
Last seen 10.3 years ago
Hi All, Could anyone please elucidate advantages of having an Annotation package for an organism or point me to any documentation that clearly lists all the various thoughts behind coming up with an Annotation package. Will not having a data frame in R (with rows as genes and columns as various types of annotations like GO, KEGG, Unigene, etc) suffice? What are the advantages of having a AnnodbBimap objects and building a package? Are there any technical benefits like faster access of information? Thanks for your time, -Sashi [[alternative HTML version deleted]]
Annotation GO Organism Annotation GO Organism • 1.1k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 4.3 years ago
United States
the memory overhead for those data.frames you speak of quickly becomes obscene when you start doing things like GO analyses On Fri, May 10, 2013 at 1:17 AM, Rameswara Sashi Kiran Challa < schalla@umail.iu.edu> wrote: > Hi All, > > Could anyone please elucidate advantages of having an Annotation package > for an organism or point me to any documentation that clearly lists all the > various thoughts behind coming up with an Annotation package. > > Will not having a data frame in R (with rows as genes and columns as > various types of annotations like GO, KEGG, Unigene, etc) suffice? What are > the advantages of having a AnnodbBimap objects and building a package? Are > there any technical benefits like faster access of information? > > Thanks for your time, > > -Sashi > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States
Hi Sashi, On Fri, May 10, 2013 at 1:17 AM, Rameswara Sashi Kiran Challa <schalla at="" umail.iu.edu=""> wrote: > Hi All, > > Could anyone please elucidate advantages of having an Annotation package > for an organism or point me to any documentation that clearly lists all the > various thoughts behind coming up with an Annotation package. Read carefully: http://lmgtfy.com/?q=spreadsheet+vs+database HTH, -steve -- Steve Lianoglou Computational Biologist Department of Bioinformatics and Computational Biology Genentech
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
On 05/10/2013 01:17 AM, Rameswara Sashi Kiran Challa wrote: > Hi All, > > Could anyone please elucidate advantages of having an Annotation package > for an organism or point me to any documentation that clearly lists all the > various thoughts behind coming up with an Annotation package. > > Will not having a data frame in R (with rows as genes and columns as > various types of annotations like GO, KEGG, Unigene, etc) suffice? What are One aspect not mentioned is that one gets to exploit R's packaging system to provide easily distributed and documented versions of the data. Suppose you created the package eight months ago and have forgotten some of the detaiils. Easy, check out the package description and help page. Say you're working with a couple of colleagues, and you've been relatively disciplined about incrementing the annotation package when your data changes (or are using a public Bioc annotation package, with versions strictly tied to R / Bioc releases). Easily spot when unusual results are due to differences in data version (hence the frequent request for the output of 'sessionInfo()' on this mailing list) and adopt / instill 'best practices' that make sure everyone on the team (including yourself, even if your team is only 1) are using the same version. Martin > the advantages of having a AnnodbBimap objects and building a package? Are > there any technical benefits like faster access of information? > > Thanks for your time, > > -Sashi > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Just adding to what Martin already said, it's mostly about making your research more easily reproducible by using a consistent and traceable source for your information. This sort of thing is important for doing science, where other people will need to reproduce your results exactly. If all you had was your own personal data.frame, nobody else can really work with that unless you also make it available online etc. And then assuming you can serve it up somewhere in perpetuity, you also have to explain exactly how you made it etc. In short, when you went to write the methods section for your findings, you would end up making and maintaining your own annotation resource and thus reinventing the wheel. There are other advantages too. For example, many different kinds of annotation data are made into packages together, so you can know which version of GO was being used by a large group of people and also which entrez gene IDs were considered valid etc. So things are overall more standardized for a given version of bioconductor, which can aid in collaborations (since people are basically all working off the same data set). Marc On 05/10/2013 07:03 PM, Martin Morgan wrote: > On 05/10/2013 01:17 AM, Rameswara Sashi Kiran Challa wrote: >> Hi All, >> >> Could anyone please elucidate advantages of having an Annotation package >> for an organism or point me to any documentation that clearly lists >> all the >> various thoughts behind coming up with an Annotation package. >> >> Will not having a data frame in R (with rows as genes and columns as >> various types of annotations like GO, KEGG, Unigene, etc) suffice? >> What are > > One aspect not mentioned is that one gets to exploit R's packaging > system to provide easily distributed and documented versions of the > data. Suppose you created the package eight months ago and have > forgotten some of the detaiils. Easy, check out the package > description and help page. Say you're working with a couple of > colleagues, and you've been relatively disciplined about incrementing > the annotation package when your data changes (or are using a public > Bioc annotation package, with versions strictly tied to R / Bioc > releases). Easily spot when unusual results are due to differences in > data version (hence the frequent request for the output of > 'sessionInfo()' on this mailing list) and adopt / instill 'best > practices' that make sure everyone on the team (including yourself, > even if your team is only 1) are using the same version. > > Martin > >> the advantages of having a AnnodbBimap objects and building a >> package? Are >> there any technical benefits like faster access of information? >> >> Thanks for your time, >> >> -Sashi >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY
0
Entering edit mode
this is potentially a very important point however, the lack of easy install availability of previous versions of bioc packages works against it.... ~ malcolm_cook at stowers.org ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Marc Carlson [mcarlson@fhcrc.org] Sent: Monday, May 13, 2013 3:45 PM To: bioconductor at r-project.org Subject: Re: [BioC] advantages of annotation packages Just adding to what Martin already said, it's mostly about making your research more easily reproducible by using a consistent and traceable source for your information. This sort of thing is important for doing science, where other people will need to reproduce your results exactly. If all you had was your own personal data.frame, nobody else can really work with that unless you also make it available online etc. And then assuming you can serve it up somewhere in perpetuity, you also have to explain exactly how you made it etc. In short, when you went to write the methods section for your findings, you would end up making and maintaining your own annotation resource and thus reinventing the wheel. There are other advantages too. For example, many different kinds of annotation data are made into packages together, so you can know which version of GO was being used by a large group of people and also which entrez gene IDs were considered valid etc. So things are overall more standardized for a given version of bioconductor, which can aid in collaborations (since people are basically all working off the same data set). Marc On 05/10/2013 07:03 PM, Martin Morgan wrote: > On 05/10/2013 01:17 AM, Rameswara Sashi Kiran Challa wrote: >> Hi All, >> >> Could anyone please elucidate advantages of having an Annotation package >> for an organism or point me to any documentation that clearly lists >> all the >> various thoughts behind coming up with an Annotation package. >> >> Will not having a data frame in R (with rows as genes and columns as >> various types of annotations like GO, KEGG, Unigene, etc) suffice? >> What are > > One aspect not mentioned is that one gets to exploit R's packaging > system to provide easily distributed and documented versions of the > data. Suppose you created the package eight months ago and have > forgotten some of the detaiils. Easy, check out the package > description and help page. Say you're working with a couple of > colleagues, and you've been relatively disciplined about incrementing > the annotation package when your data changes (or are using a public > Bioc annotation package, with versions strictly tied to R / Bioc > releases). Easily spot when unusual results are due to differences in > data version (hence the frequent request for the output of > 'sessionInfo()' on this mailing list) and adopt / instill 'best > practices' that make sure everyone on the team (including yourself, > even if your team is only 1) are using the same version. > > Martin > >> the advantages of having a AnnodbBimap objects and building a >> package? Are >> there any technical benefits like faster access of information? >> >> Thanks for your time, >> >> -Sashi >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On 05/14/2013 10:42 AM, Cook, Malcolm wrote: > this is potentially a very important point > > however, the lack of easy install availability of previous versions of bioc packages works against it.... I'm not understanding your comment. Bioc versions are released with specific R versions. Install the appropriate R version, and get the corresponding Bioc packages via biocLite(). Challenges occur when trying to install old R on new hardware (e.g., because the old R doesn't compile with new gcc or new libraries), but that's probably not what you mean? Martin > > ~ malcolm_cook at stowers.org > > ________________________________________ > From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Marc Carlson [mcarlson at fhcrc.org] > Sent: Monday, May 13, 2013 3:45 PM > To: bioconductor at r-project.org > Subject: Re: [BioC] advantages of annotation packages > > Just adding to what Martin already said, it's mostly about making your > research more easily reproducible by using a consistent and traceable > source for your information. This sort of thing is important for doing > science, where other people will need to reproduce your results exactly. > If all you had was your own personal data.frame, nobody else can really > work with that unless you also make it available online etc. And then > assuming you can serve it up somewhere in perpetuity, you also have to > explain exactly how you made it etc. In short, when you went to write > the methods section for your findings, you would end up making and > maintaining your own annotation resource and thus reinventing the wheel. > > There are other advantages too. For example, many different kinds of > annotation data are made into packages together, so you can know which > version of GO was being used by a large group of people and also which > entrez gene IDs were considered valid etc. So things are overall more > standardized for a given version of bioconductor, which can aid in > collaborations (since people are basically all working off the same data > set). > > > Marc > > > > On 05/10/2013 07:03 PM, Martin Morgan wrote: >> On 05/10/2013 01:17 AM, Rameswara Sashi Kiran Challa wrote: >>> Hi All, >>> >>> Could anyone please elucidate advantages of having an Annotation package >>> for an organism or point me to any documentation that clearly lists >>> all the >>> various thoughts behind coming up with an Annotation package. >>> >>> Will not having a data frame in R (with rows as genes and columns as >>> various types of annotations like GO, KEGG, Unigene, etc) suffice? >>> What are >> >> One aspect not mentioned is that one gets to exploit R's packaging >> system to provide easily distributed and documented versions of the >> data. Suppose you created the package eight months ago and have >> forgotten some of the detaiils. Easy, check out the package >> description and help page. Say you're working with a couple of >> colleagues, and you've been relatively disciplined about incrementing >> the annotation package when your data changes (or are using a public >> Bioc annotation package, with versions strictly tied to R / Bioc >> releases). Easily spot when unusual results are due to differences in >> data version (hence the frequent request for the output of >> 'sessionInfo()' on this mailing list) and adopt / instill 'best >> practices' that make sure everyone on the team (including yourself, >> even if your team is only 1) are using the same version. >> >> Martin >> >>> the advantages of having a AnnodbBimap objects and building a >>> package? Are >>> there any technical benefits like faster access of information? >>> >>> Thanks for your time, >>> >>> -Sashi >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Thank you Martin, Tim and Steve!! For future readers, the introduction section of this document here<http: www.bioconductor.org="" packages="" 2.12="" bioc="" vignettes="" annotati="" onforge="" inst="" doc="" makingnewannotationpackages.pdf=""> (written by Marc Carlson) throws some light on the purpose of having select interfaces/annotation packages. -Sashi On Sat, May 11, 2013 at 7:33 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 05/10/2013 01:17 AM, Rameswara Sashi Kiran Challa wrote: > >> Hi All, >> >> Could anyone please elucidate advantages of having an Annotation package >> for an organism or point me to any documentation that clearly lists all >> the >> various thoughts behind coming up with an Annotation package. >> >> Will not having a data frame in R (with rows as genes and columns as >> various types of annotations like GO, KEGG, Unigene, etc) suffice? What >> are >> > > One aspect not mentioned is that one gets to exploit R's packaging system > to provide easily distributed and documented versions of the data. Suppose > you created the package eight months ago and have forgotten some of the > detaiils. Easy, check out the package description and help page. Say you're > working with a couple of colleagues, and you've been relatively disciplined > about incrementing the annotation package when your data changes (or are > using a public Bioc annotation package, with versions strictly tied to R / > Bioc releases). Easily spot when unusual results are due to differences in > data version (hence the frequent request for the output of 'sessionInfo()' > on this mailing list) and adopt / instill 'best practices' that make sure > everyone on the team (including yourself, even if your team is only 1) are > using the same version. > > Martin > > the advantages of having a AnnodbBimap objects and building a package? Are >> there any technical benefits like faster access of information? >> >> Thanks for your time, >> >> -Sashi >> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]]

Login before adding your answer.

Traffic: 319 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6