Hg18 build of org.Hs.eg.db

0

Entering edit mode

Andrew Yee ▴ 350

@andrew-yee-2667

Last seen 9.6 years ago

Is there an accepted way to install the Hg18 build of org.Hs.eg.db instead of the latest Hg19 build? I can imagine one way is to download an older version of org.Hs.eg.db that is dated after March 2006 (when Hg18 was released) and before February 2009 (when Hg19 was released), but I was wondering if there was an accepted method of doing this. Thanks, Andrew [[alternative HTML version deleted]]

• 1.1k views

ADD COMMENT • link updated 13.6 years ago by Vincent J. Carey, Jr. 6.7k • written 13.6 years ago by Andrew Yee ▴ 350

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 6 weeks ago

United States

I believe the best way would be to install the "release" version of R that coincides with the org.Hs.eg.db version that you are interested in, and install it into that version of R with biocLite. In my lab we have simultaneous availability of R 2.10, 2.11, and 2.12 (and now 2.13) to allow continuity of ongoing analyses. The user must pick the appropriate version. On Tue, Sep 21, 2010 at 12:02 PM, Andrew Yee <yee at="" post.harvard.edu=""> wrote: > Is there an accepted way to install the Hg18 build of org.Hs.eg.db instead > of the latest Hg19 build? > > I can imagine one way is to download an older version of org.Hs.eg.db that > is dated after March 2006 (when Hg18 was released) and before February 2009 > (when Hg19 was released), but I was wondering if there was an accepted > method of doing this. > > Thanks, > Andrew > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 13.6 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Hi, On Tue, Sep 21, 2010 at 12:09 PM, Vincent Carey <stvjc at="" channing.harvard.edu=""> wrote: > I believe the best way would be to install the "release" version of R > that coincides with the org.Hs.eg.db > version that you are interested in, and install it into that version > of R with biocLite. ?In my lab we have > simultaneous availability of R 2.10, 2.11, and 2.12 (and now 2.13) to > allow continuity of ongoing analyses. > The user must pick the appropriate version. While this generally works, it's somehow non optimal for this particular problem especially because the package in question is an annotation package. Consider the situation where the user wants to use Rsamtools (not available in R 2.10) but is performing analyses against hg18. One could argue that perhaps the GenomicFeatures package would be better suited to replace the genome-version-specific info I'm guessing the OP wanted from org.Hs.eg.db, but I'm just giving an example. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 13.6 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

On Tue, Sep 21, 2010 at 12:16 PM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi, > > On Tue, Sep 21, 2010 at 12:09 PM, Vincent Carey > <stvjc at="" channing.harvard.edu=""> wrote: >> I believe the best way would be to install the "release" version of R >> that coincides with the org.Hs.eg.db >> version that you are interested in, and install it into that version >> of R with biocLite. ?In my lab we have >> simultaneous availability of R 2.10, 2.11, and 2.12 (and now 2.13) to >> allow continuity of ongoing analyses. >> The user must pick the appropriate version. > > While this generally works, it's somehow non optimal for this > particular problem especially because the package in question is an > annotation package. > > Consider the situation where the user wants to use Rsamtools (not > available in R 2.10) but is performing analyses against hg18. Fair point. > > One could argue that perhaps the GenomicFeatures package would be > better suited to replace the genome-version-specific info I'm guessing > the OP wanted from org.Hs.eg.db, but I'm just giving an example. Perhaps. The GenomicFeatures approach does allow the user to specify the build and makes the user responsible for maintaining that image of the feature metadata -- it is not part of a distributed package. And if I am not mistaken, the way one will most likely get richer metadata from the makeTranscriptDb... result is by decoding the transcript names or entrez ids used with an org.Hs* table. If one really needs an hg18-based set of org.Hs* mappings, one might be able to use sqlForge facilities of annotationDbi package to make it and install it alongside the distributed org.Hs*, with a distinguishing package name. I say "might" here because the reliance of this facility on possibly genome-build-dependent metadata is not completely clear to me; Marc Carlson may want to comment. But I do believe one wants to avoid the temptation of taking an "older" org.Hs* package and installing it in a newer, mismatched version of R. You might get away with it in some circumstances but it is not a good idea. It is possible to create some simple interversion communications so that answers to queries against an R 2.10-based resource can be used in R 2.12 for example, and that would be much safer. > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact >

ADD REPLY • link 13.6 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

On Tue, Sep 21, 2010 at 12:43 PM, Vincent Carey <stvjc@channing.harvard.edu>wrote: > On Tue, Sep 21, 2010 at 12:16 PM, Steve Lianoglou > <mailinglist.honeypot@gmail.com> wrote: > > Hi, > > > > On Tue, Sep 21, 2010 at 12:09 PM, Vincent Carey > > <stvjc@channing.harvard.edu> wrote: > >> I believe the best way would be to install the "release" version of R > >> that coincides with the org.Hs.eg.db > >> version that you are interested in, and install it into that version > >> of R with biocLite. In my lab we have > >> simultaneous availability of R 2.10, 2.11, and 2.12 (and now 2.13) to > >> allow continuity of ongoing analyses. > >> The user must pick the appropriate version. > > > > While this generally works, it's somehow non optimal for this > > particular problem especially because the package in question is an > > annotation package. > > > > Consider the situation where the user wants to use Rsamtools (not > > available in R 2.10) but is performing analyses against hg18. > > Fair point. > > > > > One could argue that perhaps the GenomicFeatures package would be > > better suited to replace the genome-version-specific info I'm guessing > > the OP wanted from org.Hs.eg.db, but I'm just giving an example. > > Perhaps. The GenomicFeatures approach does allow the user to specify the > build and makes the user responsible for maintaining that image of the > feature metadata -- it is not part of > a distributed package. And if I am not mistaken, the way one will > most likely get richer metadata from the > makeTranscriptDb... result is by decoding the transcript names or > entrez ids used with an org.Hs* > table. > > If one really needs an hg18-based set of org.Hs* mappings, one might > be able to use sqlForge facilities > of annotationDbi package to make it and install it alongside the > distributed org.Hs*, with a distinguishing > package name. I say "might" here because the reliance of this > facility on possibly genome-build-dependent metadata > is not completely clear to me; Marc Carlson may want to comment. > > Just to add here that the org.Hs.eg.db packages are based on NCBI data, generally speaking. The GenomicFeatures stuff is generally based on UCSC data (or some other source chosen by the user). The annotations from these various sources will not be equivalent in all cases, particularly when it comes to chromosome positions. To make things even more complicated, since there have been numerous refseq releases from NCBI over the three years from 2006-2009, the org.Hs.eg.db packages during those years will generally NOT be the same from one release to the next even though the reference genome was unchanged. > But I do believe one wants to avoid the temptation of taking an > "older" org.Hs* package and installing it in > a newer, mismatched version of R. You might get away with it in some > circumstances but it is not a good > idea. It is possible to create some simple interversion > communications so that answers to queries against an R 2.10-based > resource can be used in R 2.12 for example, and that would be much safer. > > This point cannot be overstated. While in some cases this might work, designing workflows with these kinds of hacks in place are bound to lead to difficulties at some point (always 2 days before a grant deadline or a national talk). > > > > -steve > > > > -- > > Steve Lianoglou > > Graduate Student: Computational Systems Biology > > | Memorial Sloan-Kettering Cancer Center > > | Weill Medical College of Cornell University > > Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 13.6 years ago Sean Davis 21k

0

Entering edit mode

Hello everyone, In general, you should be reaching for the GenomicFeatures package when you need things like chromosomal positions so that you can get the most control over which build you are using etc. The org. packages are meant to be "gene-centric" instead of "genome centric", which may seem like a fine point to parse on, but I think it's significant in this case. The org packages do have some limited chromosomal position information, but this is in there because people needed something a very long time ago, long before we had better solutions (ie. GenomicFeatures). It is still sometimes useful to have chromosomal positions in these packages and so they have remained there, but in most cases the org packages are really meant for simply gene-centric annotations that can be attached directly to something like an entrez gene ID. Things like GO terms and Unigene IDs. In part because the genomic ranges of transcripts and genes can vary with the build, these kinds of annotations required us to apply a modular solution. The basic notion is that you can use the GenomicFeatures package to get appropriate Genomic range information about your genome of interest and then use the org package to connect additional gene-centric information based on genes you are working with from your ranges in GenomicFeatures. Finally, for the CHRLOC and CHRLOCEND annotations in the org packages: these are actually based on the UCSC Genome builds. The most recent versions of these should specify the build being used and you can see the information about where a particular annotation mapping is coming from in its related manual page. The CHR mapping on the other hand comes from NCBI. It's all documented in the manual pages, and has been this way for ages, but I wouldn't blame you if you said this was somewhat confusing. If we were to add more of this kind of information to the org packages, things could definitely get much more confusing. And this sort of issue is another reason why we have stopped adding new genomic range annotations to the org packages and instead created the GenomicFeatures package. I hope this helps to clarify things somewhat. Please let me know if I left questions unanswered? Marc On 09/21/2010 09:55 AM, Sean Davis wrote: > On Tue, Sep 21, 2010 at 12:43 PM, Vincent Carey > <stvjc at="" channing.harvard.edu="">wrote: > > >> On Tue, Sep 21, 2010 at 12:16 PM, Steve Lianoglou >> <mailinglist.honeypot at="" gmail.com=""> wrote: >> >>> Hi, >>> >>> On Tue, Sep 21, 2010 at 12:09 PM, Vincent Carey >>> <stvjc at="" channing.harvard.edu=""> wrote: >>> >>>> I believe the best way would be to install the "release" version of R >>>> that coincides with the org.Hs.eg.db >>>> version that you are interested in, and install it into that version >>>> of R with biocLite. In my lab we have >>>> simultaneous availability of R 2.10, 2.11, and 2.12 (and now 2.13) to >>>> allow continuity of ongoing analyses. >>>> The user must pick the appropriate version. >>>> >>> While this generally works, it's somehow non optimal for this >>> particular problem especially because the package in question is an >>> annotation package. >>> >>> Consider the situation where the user wants to use Rsamtools (not >>> available in R 2.10) but is performing analyses against hg18. >>> >> Fair point. >> >> >>> One could argue that perhaps the GenomicFeatures package would be >>> better suited to replace the genome-version-specific info I'm guessing >>> the OP wanted from org.Hs.eg.db, but I'm just giving an example. >>> >> Perhaps. The GenomicFeatures approach does allow the user to specify the >> build and makes the user responsible for maintaining that image of the >> feature metadata -- it is not part of >> a distributed package. And if I am not mistaken, the way one will >> most likely get richer metadata from the >> makeTranscriptDb... result is by decoding the transcript names or >> entrez ids used with an org.Hs* >> table. >> >> If one really needs an hg18-based set of org.Hs* mappings, one might >> be able to use sqlForge facilities >> of annotationDbi package to make it and install it alongside the >> distributed org.Hs*, with a distinguishing >> package name. I say "might" here because the reliance of this >> facility on possibly genome-build-dependent metadata >> is not completely clear to me; Marc Carlson may want to comment. >> >> >> > Just to add here that the org.Hs.eg.db packages are based on NCBI data, > generally speaking. The GenomicFeatures stuff is generally based on UCSC > data (or some other source chosen by the user). The annotations from these > various sources will not be equivalent in all cases, particularly when it > comes to chromosome positions. > > To make things even more complicated, since there have been numerous refseq > releases from NCBI over the three years from 2006-2009, the org.Hs.eg.db > packages during those years will generally NOT be the same from one release > to the next even though the reference genome was unchanged. > > > >> But I do believe one wants to avoid the temptation of taking an >> "older" org.Hs* package and installing it in >> a newer, mismatched version of R. You might get away with it in some >> circumstances but it is not a good >> idea. It is possible to create some simple interversion >> communications so that answers to queries against an R 2.10-based >> resource can be used in R 2.12 for example, and that would be much safer. >> >> >> > This point cannot be overstated. While in some cases this might work, > designing workflows with these kinds of hacks in place are bound to lead to > difficulties at some point (always 2 days before a grant deadline or a > national talk). > > > >>> -steve >>> >>> -- >>> Steve Lianoglou >>> Graduate Student: Computational Systems Biology >>> | Memorial Sloan-Kettering Cancer Center >>> | Weill Medical College of Cornell University >>> Contact Info: http://cbio.mskcc.org/~lianos/contact >>> >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 13.6 years ago Marc Carlson ★ 7.2k

Login before adding your answer.