Dear all,
Is there a way with Bioconductor in which I can
convert such EnSemBL probe names into the
standard gene names?
AFFX-M27830_5_at
AFFX-M27830_M_at
ENSG00000000003_at
ENSG00000000005_at
ENSG00000000419_at
- Gundala Viswanath
Jakarta - Indonesia
On Thu, Sep 18, 2008 at 4:37 AM, Gundala Viswanath <gundalav at="" gmail.com=""> wrote:
> Dear all,
>
> Is there a way with Bioconductor in which I can
> convert such EnSemBL probe names into the
> standard gene names?
>
> AFFX-M27830_5_at
> AFFX-M27830_M_at
> ENSG00000000003_at
> ENSG00000000005_at
> ENSG00000000419_at
Hi, Gundala. In general, you do not need to cross-post to both
bioconductor and R lists.
These are not standard Ensembl names. You could strip off the "_at"
and some of them would become Ensembl gene names (the ones that begin
with ENSG; the others look like affy control probes). Then, you could
use biomart to get information about them. See the biomart vignette
and help pages for assistance.
Sean
Another alternative is to use the org.Hs.eg.db package
> library(org.Hs.eg.db)
Loading required package: AnnotationDbi
Loading required package: Biobase
Loading required package: tools
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'openVignette()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation(pkgname)'.
Loading required package: DBI
Loading required package: RSQLite
> ens <- c("ENSG00000000003","ENSG00000000005","ENSG00000000419")
> egs <- mget(ens, revmap(org.Hs.egENSEMBL))
> egs
$ENSG00000000003
[1] "7105"
$ENSG00000000005
[1] "64102"
$ENSG00000000419
[1] "8813"
> gns <- mget(unlist(egs), org.Hs.egSYMBOL)
> gns
$`7105`
[1] "TSPAN6"
$`64102`
[1] "TNMD"
$`8813`
[1] "DPM1"
Since most BioC annotation packages are Entrez Gene-centric, you will
need to map via the Entrez Gene ID, whereas you can do the direct
mapping using biomaRt.
Best,
Jim
Sean Davis wrote:
> On Thu, Sep 18, 2008 at 4:37 AM, Gundala Viswanath <gundalav at="" gmail.com=""> wrote:
>> Dear all,
>>
>> Is there a way with Bioconductor in which I can
>> convert such EnSemBL probe names into the
>> standard gene names?
>>
>> AFFX-M27830_5_at
>> AFFX-M27830_M_at
>> ENSG00000000003_at
>> ENSG00000000005_at
>> ENSG00000000419_at
>
> Hi, Gundala. In general, you do not need to cross-post to both
> bioconductor and R lists.
>
> These are not standard Ensembl names. You could strip off the "_at"
> and some of them would become Ensembl gene names (the ones that
begin
> with ENSG; the others look like affy control probes). Then, you
could
> use biomart to get information about them. See the biomart vignette
> and help pages for assistance.
>
> Sean
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
Hi Jim (and others),
Since this topic is of interest to me as well, do you have any pointer
how to construct an 'org.Hs.xx.db' library based on ENSEMBL IDs using
the direct mappings from BiomaRt?
In other words; I do know how to map ENSEMBL IDs to gene symbol, name,
GO class etc using biomart, but I would like to 'merge' these separate
files such way to get a new-style annotation db package based on
ENSEMBL
IDs (thus avoiding the use of intermediate Entrez IDs). Or is this per
definition an impossible task?
Thanks,
Guido
------------------------------------------------
Guido Hooiveld, PhD
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
Wageningen University
Biotechnion, Bomenweg 2
NL-6703 HD Wageningen
the Netherlands
tel: (+)31 317 485788
fax: (+)31 317 483342
internet: http://nutrigene.4t.com
email: guido.hooiveld at wur.nl
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of
> James W. MacDonald
> Sent: 18 September 2008 14:24
> To: Gundala Viswanath
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Converting EnSeMBL Probe names into Gene Name
>
> Another alternative is to use the org.Hs.eg.db package
>
> > library(org.Hs.eg.db)
> Loading required package: AnnotationDbi
> Loading required package: Biobase
> Loading required package: tools
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material. To view, type
> 'openVignette()'. To cite Bioconductor, see
> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>
> Loading required package: DBI
> Loading required package: RSQLite
> > ens <- c("ENSG00000000003","ENSG00000000005","ENSG00000000419")
> > egs <- mget(ens, revmap(org.Hs.egENSEMBL)) > egs
> $ENSG00000000003
> [1] "7105"
>
> $ENSG00000000005
> [1] "64102"
>
> $ENSG00000000419
> [1] "8813"
>
> > gns <- mget(unlist(egs), org.Hs.egSYMBOL) > gns $`7105`
> [1] "TSPAN6"
>
> $`64102`
> [1] "TNMD"
>
> $`8813`
> [1] "DPM1"
>
> Since most BioC annotation packages are Entrez Gene-centric,
> you will need to map via the Entrez Gene ID, whereas you can
> do the direct mapping using biomaRt.
>
> Best,
>
> Jim
>
> Sean Davis wrote:
> > On Thu, Sep 18, 2008 at 4:37 AM, Gundala Viswanath
> <gundalav at="" gmail.com=""> wrote:
> >> Dear all,
> >>
> >> Is there a way with Bioconductor in which I can convert
> such EnSemBL
> >> probe names into the standard gene names?
> >>
> >> AFFX-M27830_5_at
> >> AFFX-M27830_M_at
> >> ENSG00000000003_at
> >> ENSG00000000005_at
> >> ENSG00000000419_at
> >
> > Hi, Gundala. In general, you do not need to cross-post to both
> > bioconductor and R lists.
> >
> > These are not standard Ensembl names. You could strip off the
"_at"
> > and some of them would become Ensembl gene names (the ones
> that begin
> > with ENSG; the others look like affy control probes).
> Then, you could
> > use biomart to get information about them. See the biomart
> vignette
> > and help pages for assistance.
> >
> > Sean
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-0646
> 734-936-8662
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
On Fri, Sep 19, 2008 at 10:38 AM, Hooiveld, Guido <guido.hooiveld at="" wur.nl=""> wrote:
> Hi Jim (and others),
>
> Since this topic is of interest to me as well, do you have any
pointer
> how to construct an 'org.Hs.xx.db' library based on ENSEMBL IDs
using
> the direct mappings from BiomaRt?
> In other words; I do know how to map ENSEMBL IDs to gene symbol,
name,
> GO class etc using biomart, but I would like to 'merge' these
separate
> files such way to get a new-style annotation db package based on
ENSEMBL
> IDs (thus avoiding the use of intermediate Entrez IDs). Or is this
per
> definition an impossible task?
See the SQLForge documentation in the AnnotationDBI. You can use the
list of ensembl IDs and their corresponding Entrez Gene IDs to
construct a new annotation db package. Alternatively, you could get
the Ensembl-Entrez-gene relationship using biomart. The final
products will be similar, but probably not identical. Also, keep in
mind that the actual data in the org.db packages are based on NCBI
annotation even though the key would be an ensembl ID.
With all that said, the simpler way to go is to simply convert your
entire list to entrez gene id using either the org.Hs mappings or
biomart and then proceed with the Entrez gene ID as the key.
Sean
> Thanks,
> Guido
>
> ------------------------------------------------
> Guido Hooiveld, PhD
> Nutrition, Metabolism & Genomics Group
> Division of Human Nutrition
> Wageningen University
> Biotechnion, Bomenweg 2
> NL-6703 HD Wageningen
> the Netherlands
> tel: (+)31 317 485788
> fax: (+)31 317 483342
> internet: http://nutrigene.4t.com
> email: guido.hooiveld at wur.nl
>
>
>
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of
>> James W. MacDonald
>> Sent: 18 September 2008 14:24
>> To: Gundala Viswanath
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] Converting EnSeMBL Probe names into Gene Name
>>
>> Another alternative is to use the org.Hs.eg.db package
>>
>> > library(org.Hs.eg.db)
>> Loading required package: AnnotationDbi
>> Loading required package: Biobase
>> Loading required package: tools
>>
>> Welcome to Bioconductor
>>
>> Vignettes contain introductory material. To view, type
>> 'openVignette()'. To cite Bioconductor, see
>> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>>
>> Loading required package: DBI
>> Loading required package: RSQLite
>> > ens <- c("ENSG00000000003","ENSG00000000005","ENSG00000000419")
>> > egs <- mget(ens, revmap(org.Hs.egENSEMBL)) > egs
>> $ENSG00000000003
>> [1] "7105"
>>
>> $ENSG00000000005
>> [1] "64102"
>>
>> $ENSG00000000419
>> [1] "8813"
>>
>> > gns <- mget(unlist(egs), org.Hs.egSYMBOL) > gns $`7105`
>> [1] "TSPAN6"
>>
>> $`64102`
>> [1] "TNMD"
>>
>> $`8813`
>> [1] "DPM1"
>>
>> Since most BioC annotation packages are Entrez Gene-centric,
>> you will need to map via the Entrez Gene ID, whereas you can
>> do the direct mapping using biomaRt.
>>
>> Best,
>>
>> Jim
>>
>> Sean Davis wrote:
>> > On Thu, Sep 18, 2008 at 4:37 AM, Gundala Viswanath
>> <gundalav at="" gmail.com=""> wrote:
>> >> Dear all,
>> >>
>> >> Is there a way with Bioconductor in which I can convert
>> such EnSemBL
>> >> probe names into the standard gene names?
>> >>
>> >> AFFX-M27830_5_at
>> >> AFFX-M27830_M_at
>> >> ENSG00000000003_at
>> >> ENSG00000000005_at
>> >> ENSG00000000419_at
>> >
>> > Hi, Gundala. In general, you do not need to cross-post to both
>> > bioconductor and R lists.
>> >
>> > These are not standard Ensembl names. You could strip off the
"_at"
>> > and some of them would become Ensembl gene names (the ones
>> that begin
>> > with ENSG; the others look like affy control probes).
>> Then, you could
>> > use biomart to get information about them. See the biomart
>> vignette
>> > and help pages for assistance.
>> >
>> > Sean
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Hildebrandt Lab
>> 8220D MSRB III
>> 1150 W. Medical Center Drive
>> Ann Arbor MI 48109-0646
>> 734-936-8662
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>