rtracklayer and gene symbols
1
0
Entering edit mode
@christian-ruckert-3294
Last seen 4.8 years ago
Germany
Is there an elegant way to find the chromosome, start and end position to a given gene symbol via rtracklayer. In the table browser on USCS website I can provide these information by pasting a list of identifiers, so the requested information must be somewhere in the tables. My found solution is kind of indirect by first getting a table of all UCSC names together with gene symbols, finding the corresponding UCSC names to my symbols and then searching these UCSC names in a table of all UCSC names with location. Thank you in advance, Christian
rtracklayer rtracklayer • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 43 minutes ago
United States
Hi Christian, Christian Ruckert wrote: > Is there an elegant way to find the chromosome, start and end position > to a given gene symbol via rtracklayer. I don't know about using rtracklayer, but there are any number of ways to get these data. If you want directly from UCSC, you can query their MySQL server directly: > library(RMySQL) Loading required package: DBI > con <- dbConnect("MySQL", user = "genome", host = "genome-mysql.cse.ucsc.edu", dbname = "hg18") > gns <- c("BRIP1","VEGFA","FANCB","TP53") > sql <- paste("select name2, txStart, txEnd from refGene where name2 in ('", + paste(gns, collapse = "','"), "');", sep = "") > dbGetQuery(con, sql) name2 txStart txEnd 1 BRIP1 57114766 57295537 2 FANCB 14771449 14801105 3 FANCB 14771449 14801105 4 TP53 7512444 7531588 5 TP53 7512444 7531588 6 TP53 7512444 7519536 7 TP53 7512444 7519536 8 TP53 7512444 7519536 9 TP53 7512444 7531588 10 TP53 7512444 7531588 11 VEGFA 43845930 43862201 12 VEGFA 43845930 43862201 13 VEGFA 43845930 43862201 14 VEGFA 43845930 43862201 15 VEGFA 43845930 43862201 16 VEGFA 43845930 43862201 17 VEGFA 43845930 43862201 Or you could use the org.Hs.eg.db package supplied by BioC: > library(org.Hs.eg.db) > egs <- unlist(mget(gns, revmap(org.Hs.egSYMBOL))) > egs BRIP1 VEGFA FANCB TP53 "83990" "7422" "2187" "7157" > starts <- unlist(mget(egs, org.Hs.egCHRLOC)) > ends <- unlist(mget(egs, org.Hs.egCHRLOCEND)) ## two end locations for TP53, so double up the symbol > data.frame(gns=gns[c(1:4,4)], starts, ends) gns starts ends 1 BRIP1 -57114766 -57295537 2 VEGFA 43845930 43862201 3 FANCB -14771449 -14801105 4 TP53 -7512444 -7531588 5 TP53 -7512444 -7519536 Or you could use biomaRt: > library(biomaRt) > mart <- useMart("ensembl", "hsapiens_gene_ensembl") Checking attributes ... ok Checking filters ... ok > getBM(c("hgnc_symbol","start_position","end_position"), "hgnc_symbol", gns, mart) hgnc_symbol start_position end_position 1 FANCB 14861529 14891184 2 TP53 7565257 7590863 3 VEGFA 43737948 43754224 4 BRIP1 59759985 59940755 Best, Jim > > In the table browser on USCS website I can provide these information by > pasting a list of identifiers, so the requested information must be > somewhere in the tables. > > My found solution is kind of indirect by first getting a table of all > UCSC names together with gene symbols, finding the corresponding UCSC > names to my symbols and then searching these UCSC names in a table of > all UCSC names with location. > > Thank you in advance, > Christian > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD COMMENT

Login before adding your answer.

Traffic: 774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6