Hello,
I have a list of genes which are not official gene symbols. Normally
in
this case I would search gene in entrez to see if it is an alias and
then take the official symbol. Is there a way to (semi) automate this
within bioconductor?
If this fails I normally google it to see if it is likely to be a
misspelling S instead of 5 etc. ANy suggestions for that?
Many thanks
Dan
--
**************************************************************
Daniel Brewer, Ph.D.
Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.brewer at icr.ac.uk
**************************************************************
The Institute of Cancer Research: Royal Cancer Hospital, a charitable
Company Limited by Guarantee, Registered in England under Company No.
534147 with its Registered Office at 123 Old Brompton Road, London SW7
3RP.
This e-mail message is confidential and for use by the
a...{{dropped:2}}
On Wed, Apr 8, 2009 at 9:52 AM, Daniel Brewer
<daniel.brewer@icr.ac.uk>wrote:
> Hello,
>
> I have a list of genes which are not official gene symbols.
Normally in
> this case I would search gene in entrez to see if it is an alias and
> then take the official symbol. Is there a way to (semi) automate
this
> within bioconductor?
>
> If this fails I normally google it to see if it is likely to be a
> misspelling S instead of 5 etc. ANy suggestions for that?
>
It is often a good idea to check the annotation packages for this type
of
thing. For the org.XX.eg.db (XX represents the organism of interest)
packages, there is the org.XX.egALIAS2EG table that maps aliases to
entrez
gene.
Sean
[[alternative HTML version deleted]]
Hi Dan,
The org.XX.egALIAS2EG map combined with some fuzzy matching
function can help you do this:
> library(org.Hs.eg.db)
> get("S-HT3c2", org.Hs.egALIAS2EG)
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "S-HT3c2" not found
> agrep("S-HT3c2", keys(org.Hs.egALIAS2EG), value=TRUE,
max.distance=1)
[1] "5-HT3c2"
The 'max.distance argument' lets you control the max number of
misspelling
letters (including inserted/deleted letters):
> get("WUGSC:H-DJO747G182", org.Hs.egALIAS2EG)
Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
value for "WUGSC:H-DJO747G182" not found
> agrep("WUGSC:H-DJO747G182", keys(org.Hs.egALIAS2EG), value=TRUE,
max.distance=2)
character(0)
> agrep("WUGSC:H-DJO747G182", keys(org.Hs.egALIAS2EG), value=TRUE,
max.distance=3)
[1] "WUGSC:H_DJ0747G18.2"
Cheers,
H.
Daniel Brewer wrote:
> Hello,
>
> I have a list of genes which are not official gene symbols.
Normally in
> this case I would search gene in entrez to see if it is an alias and
> then take the official symbol. Is there a way to (semi) automate
this
> within bioconductor?
>
> If this fails I normally google it to see if it is likely to be a
> misspelling S instead of 5 etc. ANy suggestions for that?
>
> Many thanks
>
> Dan
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319