How to get gene information

0

Entering edit mode

Kay Jaja ▴ 90

@kay-jaja-3481

Last seen 11.4 years ago

I have a list of 80 genes in a txt file and I am looking to use a data base, for example NCBI to get information on each of these gene. I need get the start and the end base pair position for each gene listed in my file? Any idea how to get started or what to use? Your help is greatly appreciated [[alternative HTML version deleted]]

• 1.6k views

ADD COMMENT • link updated 16.7 years ago by Saroj K Mohapatra ▴ 400 • written 16.7 years ago by Kay Jaja ▴ 90

0

Entering edit mode

Saroj K Mohapatra ▴ 400

@saroj-k-mohapatra-3419

Last seen 11.4 years ago

You can do some of the work within bioconductor with the org. annotation packages. Suppose you have a list of 3 human gene symbols. > glist [1] "A1BG" "A2M" "A2MP" Using the corresponding "org." package: >library("org.Hs.eg.db") you can map the gene symbols to Entrez gene ids: > mget(glist, revmap(org.Hs.egSYMBOL)) $A1BG [1] "1" $A2M [1] "2" $A2MP [1] "3" There are many other mappings available. Look at: > ls("package:org.Hs.eg.db") If the organism is something else, use the appropriate org. package, e.g., org.Mm.eg.db The second term (Mm) is a short form combining the first letter of genus name and first letter of species name. The full list of annoatation packages are available at http://www.bioconductor.org/packages/release/data/annotation/ Saroj Kay Jaja wrote: > I have a list of 80 genes in a txt file and I am looking to use a data base, for example NCBI to get information on each of these gene. I need get the start and the end base pair position for each gene listed in my file? Any idea how to get started or what to use? > > Your help is greatly appreciated > > > > [[alternative HTML version deleted]] > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 16.7 years ago Saroj K Mohapatra ▴ 400

0

Entering edit mode

Saroj K Mohapatra ▴ 400

@saroj-k-mohapatra-3419

Last seen 11.4 years ago

Hi, I might have misunderstood your question the first time. Is it that you have a list of gene ids and you need to find their start and end locations on the chromosome? If so, I show an example below. I have a list with three genes: > glist [1] "CRIPAK" "CAND2" "STK25" I get the entrez gene ids: > eglist=as.character(unlist(mget(glist, revmap(org.Hs.egSYMBOL)))) > eglist [1] "285464" "23066" "10494" I find out which chromosomes these belong to: > mget(eglist, org.Hs.egCHR) $`285464` [1] "4" $`23066` [1] "3" $`10494` [1] "2" Find the start position: > mget(eglist, org.Hs.egCHRLOC) $`285464` 4 1375339 $`23066` 3 12813170 $`10494` 2 -242083104 And the end positions: > mget(eglist, org.Hs.egCHRLOCEND) $`285464` 4 1379782 $`23066` 3 12851301 $`10494` 2 -242096707 Is this what you are looking for? Best, Saroj Kay Jaja wrote: > I have a list of 80 genes in a txt file and I am looking to use a data base, for example NCBI to get information on each of these gene. I need get the start and the end base pair position for each gene listed in my file? Any idea how to get started or what to use? > > Your help is greatly appreciated > > > > [[alternative HTML version deleted]] > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 16.7 years ago Saroj K Mohapatra ▴ 400

Login before adding your answer.