annotation in Ensembl using biomart
1
0
Entering edit mode
jason0701 ▴ 190
@jason0701-3921
Last seen 5.0 years ago
Hi all, I wonder whether I could help from this list. Sorry if this is a duplicate question. I get confused with the following mapping (by using the BioMart website). They share the same ENSG. My purpose is to match ENSG to a gene symbol. Do you have any suggestion which one I should use? Thanks, Ensembl Gene ID Ensembl Transcript ID HGNC symbol HGNC curated gene name ENSG00000008128 ENST00000401097 CDK11B CDC2L2 ENSG00000008128 ENST00000401097 CDK11A CDC2L2 ENSG00000008128 ENST00000401097 CDK11B CDC2L1 ENSG00000008128 ENST00000401097 CDK11A CDC2L1 ENSG00000008128 ENST00000341832 CDK11B CDC2L2 ENSG00000008128 ENST00000341832 CDK11A CDC2L2 ENSG00000008128 ENST00000341832 CDK11B CDC2L1 ENSG00000008128 ENST00000341832 CDK11A CDC2L1 ENSG00000008128 ENST00000407249 CDK11B CDC2L2 ENSG00000008128 ENST00000407249 CDK11A CDC2L2 Jason
• 2.0k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Dear Jason a quick look at the HGNC website (http://www.genenames.org) will tell you that CDC2L1 is the previous name for CDK11B (the currently approved gene symbol) and similarly CDC2L2 for CDK11A and furthermore that Ensembl as well as the UCSC genome browser in the meanwhile map them to the same place in the reference genome and consider them isoforms of the same gene: http://www.genenames.org/data/hgnc_data.php?hgnc_id=1729 http://www.genenames.org/data/hgnc_data.php?hgnc_id=1730 OTOH, Entrez and UniProt consider them as separate genes ("Duplicated gene. CDK11A and CDK11B encode almost identical protein kinases of 110 kDa that ..."): http://www.uniprot.org/uniprot/Q9UQ88 Biology, and the history of biological discovery, can be messy... Other people might have more insight, but I bet it is a long story :) Wolfgang Jason Lu scripsit 11/03/10 17:06: > Hi all, > > I wonder whether I could help from this list. Sorry if this is a > duplicate question. > > I get confused with the following mapping (by using the BioMart > website). They share the same ENSG. My purpose is to match ENSG to a > gene symbol. > Do you have any suggestion which one I should use? > Thanks, > > > Ensembl Gene ID Ensembl Transcript ID HGNC symbol HGNC curated gene name > ENSG00000008128 ENST00000401097 CDK11B CDC2L2 > ENSG00000008128 ENST00000401097 CDK11A CDC2L2 > ENSG00000008128 ENST00000401097 CDK11B CDC2L1 > ENSG00000008128 ENST00000401097 CDK11A CDC2L1 > ENSG00000008128 ENST00000341832 CDK11B CDC2L2 > ENSG00000008128 ENST00000341832 CDK11A CDC2L2 > ENSG00000008128 ENST00000341832 CDK11B CDC2L1 > ENSG00000008128 ENST00000341832 CDK11A CDC2L1 > ENSG00000008128 ENST00000407249 CDK11B CDC2L2 > ENSG00000008128 ENST00000407249 CDK11A CDC2L2 > > Jason > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber/contact
ADD COMMENT
0
Entering edit mode
Dear Dr Huber, Thanks for the reply. You are right that CDC2L1 is the previous name for CDK11B, and CDC2L2 for CDK11A. I guess I was confused by the output given by BioMart, in which the match between old names and new names totally are random (see the previous post). Could be errors in BioMart (table-join)? Thanks again, Jason On Thu, Mar 11, 2010 at 12:42 PM, Wolfgang Huber <whuber at="" embl.de=""> wrote: > > Dear Jason > > a quick look at the HGNC website (http://www.genenames.org) will tell you > that CDC2L1 is the previous name for CDK11B (the currently approved gene > symbol) and similarly CDC2L2 for CDK11A and furthermore that Ensembl as well > as the UCSC genome browser in the meanwhile map them to the same place in > the reference genome and consider them isoforms of the same gene: > http://www.genenames.org/data/hgnc_data.php?hgnc_id=1729 > http://www.genenames.org/data/hgnc_data.php?hgnc_id=1730 > > OTOH, Entrez and UniProt consider them as separate genes ("Duplicated gene. > CDK11A and CDK11B encode almost identical protein kinases of 110 kDa that > ..."): http://www.uniprot.org/uniprot/Q9UQ88 > > Biology, and the history of biological discovery, can be messy... > Other people might have more insight, but I bet it is a long story :) > > ? ? ? ?Wolfgang > > > Jason Lu scripsit 11/03/10 17:06: >> >> Hi all, >> >> I wonder whether I could help from this list. Sorry if this is a >> duplicate question. >> >> I get confused with the following mapping (by using the BioMart >> website). They share the same ENSG. My purpose is to match ENSG to a >> gene symbol. >> Do you have any suggestion which one I should use? >> Thanks, >> >> >> Ensembl Gene ID Ensembl Transcript ID HGNC symbol HGNC curated gene name >> ENSG00000008128 ENST00000401097 CDK11B CDC2L2 >> ENSG00000008128 ENST00000401097 CDK11A CDC2L2 >> ENSG00000008128 ENST00000401097 CDK11B CDC2L1 >> ENSG00000008128 ENST00000401097 CDK11A CDC2L1 >> ENSG00000008128 ENST00000341832 CDK11B CDC2L2 >> ENSG00000008128 ENST00000341832 CDK11A CDC2L2 >> ENSG00000008128 ENST00000341832 CDK11B CDC2L1 >> ENSG00000008128 ENST00000341832 CDK11A CDC2L1 >> ENSG00000008128 ENST00000407249 CDK11B CDC2L2 >> ENSG00000008128 ENST00000407249 CDK11A CDC2L2 >> >> Jason >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > > Best wishes > ? ? Wolfgang > > > -- > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber/contact > > >
ADD REPLY
0
Entering edit mode
Dear Jason > Thanks for the reply. > > You are right that CDC2L1 is the previous name for CDK11B, and CDC2L2 > for CDK11A. I guess I was confused by the output given by BioMart, in > which the match between old names and new names totally are random > (see the previous post). Could be errors in BioMart (table-join)? No, not an error in BioMart (nor biomaRt) - this is what the database says. Please read my previous message. Wolfgang > Thanks again, > Jason > > > On Thu, Mar 11, 2010 at 12:42 PM, Wolfgang Huber <whuber at="" embl.de=""> wrote: >> Dear Jason >> >> a quick look at the HGNC website (http://www.genenames.org) will tell you >> that CDC2L1 is the previous name for CDK11B (the currently approved gene >> symbol) and similarly CDC2L2 for CDK11A and furthermore that Ensembl as well >> as the UCSC genome browser in the meanwhile map them to the same place in >> the reference genome and consider them isoforms of the same gene: >> http://www.genenames.org/data/hgnc_data.php?hgnc_id=1729 >> http://www.genenames.org/data/hgnc_data.php?hgnc_id=1730 >> >> OTOH, Entrez and UniProt consider them as separate genes ("Duplicated gene. >> CDK11A and CDK11B encode almost identical protein kinases of 110 kDa that >> ..."): http://www.uniprot.org/uniprot/Q9UQ88 >> >> Biology, and the history of biological discovery, can be messy... >> Other people might have more insight, but I bet it is a long story :) >> >> Wolfgang >> >> >> Jason Lu scripsit 11/03/10 17:06: >>> Hi all, >>> >>> I wonder whether I could help from this list. Sorry if this is a >>> duplicate question. >>> >>> I get confused with the following mapping (by using the BioMart >>> website). They share the same ENSG. My purpose is to match ENSG to a >>> gene symbol. >>> Do you have any suggestion which one I should use? >>> Thanks, >>> >>> >>> Ensembl Gene ID Ensembl Transcript ID HGNC symbol HGNC curated gene name >>> ENSG00000008128 ENST00000401097 CDK11B CDC2L2 >>> ENSG00000008128 ENST00000401097 CDK11A CDC2L2 >>> ENSG00000008128 ENST00000401097 CDK11B CDC2L1 >>> ENSG00000008128 ENST00000401097 CDK11A CDC2L1 >>> ENSG00000008128 ENST00000341832 CDK11B CDC2L2 >>> ENSG00000008128 ENST00000341832 CDK11A CDC2L2 >>> ENSG00000008128 ENST00000341832 CDK11B CDC2L1 >>> ENSG00000008128 ENST00000341832 CDK11A CDC2L1 >>> ENSG00000008128 ENST00000407249 CDK11B CDC2L2 >>> ENSG00000008128 ENST00000407249 CDK11A CDC2L2 >>> >>> Jason >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> -- >> >> Best wishes >> Wolfgang >> >> >> -- >> Wolfgang Huber >> EMBL >> http://www.embl.de/research/units/genome_biology/huber/contact >> >> >> -- Best wishes Wolfgang -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber/contact
ADD REPLY

Login before adding your answer.

Traffic: 612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6