how to get Entrez Gene ID from genomic coordinates
1
1
Entering edit mode
Bruna Marini ▴ 20
@bruna-marini-6253
Last seen 11.2 years ago
Hi, I have a batch of genomic coordinates, and I need to find all the genes (Entrez Gene ID) covered by those coordinates. I am new to R, but I am able to use biomaRt in R to get various annotation of genes from gene ID lists. Would it be possible to retrieve my IDs from the coordinates? The coordinates are in the chr:start:end format; I need the genes from both the strand. es. 1:11401198:11694590 1:14877629:15246452 1:38065507:38258622 Thank you for the attention, best regards, Bruna Bruna Marini, Molecular Medicine Lab ICGEB Padriciano 99, 34149 Trieste, Italy marini at icgeb.org
biomaRt biomaRt • 2.9k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States
Hi Bruna, On Tuesday, November 19, 2013 7:07:28 AM, Bruna Marini wrote: > Hi, > I have a batch of genomic coordinates, and I need to find all the genes (Entrez Gene ID) covered by those coordinates. > I am new to R, but I am able to use biomaRt in R to get various annotation of genes from gene ID lists. > Would it be possible to retrieve my IDs from the coordinates? > > The coordinates are in the chr:start:end format; I need the genes from both the strand. > es. > 1:11401198:11694590 > 1:14877629:15246452 > 1:38065507:38258622 You could probably use biomaRt, but Steffen Durinck would know the best way for that. He might be along later with an example, and it may well be a one-liner. You could also use an organism level package as well. Note that getting used to the infrastructure available in BioC will help with many other queries you might have, so it is worth it to learn. Here I assume these are coordinates from the GRCh37/hg19 Homo sapiens genome. If not, you will have to make the necessary changes. library(Homo.sapiens) kg <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene) mg <- GRanges(seq = rep("chr1", 3), IRanges(start = c(11401198,14877629,38065507), end = c(11694590,15246452,38258622))) Normally you would have read these data in. You could do something like dat <- scan("filename.txt", what = "c") dat <- do.call("rbind", strsplit(dat, ":")) mg <- GRanges(paste0("chr", dat[,1]), IRanges(dat[,2], dat[,3])) which will replicate what I did by hand olaps <- findOverlaps(kg, mg) > olaps Hits of length 6 queryLength: 23056 subjectLength: 3 queryHits subjectHits <integer> <integer> 1 7105 2 2 9113 3 3 9114 3 4 14481 3 5 14649 3 6 16011 1 gn <- elementMetadata(kg)[queryHits(olaps),"gene_id"] CharacterList of length 6 [[1]] 23254 [[2]] 284654 [[3]] 284656 [[4]] 54955 [[5]] 55143 [[6]] 57540 > select(Homo.sapiens, unlist(gn), "SYMBOL", "ENTREZID") ENTREZID SYMBOL 1 23254 KAZN 2 284654 RSPO1 3 284656 EPHA10 4 54955 C1orf109 5 55143 CDCA8 6 57540 PTCHD2 Or if you still want the ranges, you can just add the symbols > smkg <- kg[queryHits(olaps),] > elementMetadata(smkg)$Symbol <- select(Homo.sapiens, unlist(gn), "SYMBOL", "ENTREZID")[,2] > smkg GRanges with 6 ranges and 2 metadata columns: seqnames ranges strand | gene_id Symbol <rle> <iranges> <rle> | <characterlist> <character> 23254 chr1 [14925213, 15444544] + | 23254 KAZN 284654 chr1 [38076951, 38100595] - | 284654 RSPO1 284656 chr1 [38179553, 38230824] - | 284656 EPHA10 54955 chr1 [38147242, 38157888] - | 54955 C1orf109 55143 chr1 [38158073, 38175391] + | 55143 CDCA8 57540 chr1 [11539295, 11597640] + | 57540 PTCHD2 Best, Jim > > Thank you for the attention, > best regards, > > Bruna > > Bruna Marini, > Molecular Medicine Lab > ICGEB > Padriciano 99, 34149 Trieste, Italy > marini at icgeb.org > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 1444 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6