Question

Fwd: all human gene coordinates

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hah -- forgot to CC bioc-list, even though I suggested you not forget that you should do the same ;-) ---------- Forwarded message ---------- From: Steve Lianoglou <mailinglist.honeypot@gmail.com> Date: Wed, Dec 5, 2012 at 12:20 PM Subject: Re: [BioC] all human gene coordinates To: Wim Kreinen <wkreinen at="" gmail.com=""> Hi Wim, Please keep emails on the bioc list by hitting "reply all" -- this way you can get more (and better help) by having more eyes on your question, and also others can benefit as well. So: On Wed, Dec 5, 2012 at 11:29 AM, Wim Kreinen <wkreinen at="" gmail.com=""> wrote: > This sounds promising. > And principally I understand how it works but ... How do I define keys if I > want all transcripts? > I defined via isActiveSeq the chr1...chr22, chrX, chrY as active > chromosomes. > > I tried > library ("TxDb.Hsapiens. UCSC.hg19.knownGenes") > txdb->TxDb.Hsapiens. UCSC.hg19.knownGenes > cols->c("TXCHROM", "TXSTRAND", "TXSTART", "TXEND") > keys -> ? #How do I define keys if I want all transcripts? > alltranscripts->select (txdb, keys=keys, cols=cols, keytype="TXID") First: what's up w/ the spaces in your "TxDb.Hsapiens.[SPACE]UCSC..." It's also ...knownGene -- not ...knownGeneS Also, a suggestion: use `<-` for assignment, and not `->` ... although the latter works, if anybody else is meant to read your code, they're likely going to be confused for a bit until they get used to your "odd" (but correct) choice of assignment direction. Anyhow -- how about: R> library(BiocInstaller) R> biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") R> library("TxDb.Hsapiens.UCSC.hg19.knownGene") R> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene R> txs <- transcripts(txdb) R> head(txs) R> head(txs) GRanges with 6 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr1 [ 11874, 14409] + | 1 uc001aaa.3 [2] chr1 [ 11874, 14409] + | 2 uc010nxq.1 ... the ucsc id's are in the tx_name column. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

Cancer Cancer • 921 views

ADD COMMENT • link updated 11.4 years ago by Natasha ▴ 440 • written 11.4 years ago by Steve Lianoglou ★ 13k

score 0 · Answer 1 · 2012-12-07

Dear All, I am trying to annotate my DE gene list using biomart, but keep getting an error and an empty output. I can't seem to figure out where I have gone wrong in my code (I suspect it might be something really silly). Help much appreciated. Code below. ########## library("biomaRt") listMarts() listMarts(host='jul2012.archive.ensembl.org'ensembl68 = useMart(host='jul2012.archive.ensembl.org', biomart='ENSEMBL_MART_ENSEMBL') listDatasets(ensembl68) ensembl68 = useDataset("hsapiens_gene_ensembl", mart=ensembl68) listFilters(ensembl68) listAttributes(ensembl68) annot.tot = getBM(attributes=c('ensembl_gene_id','external_gene_id','h gnc_symbol','description','entrezgene','chromosome_name','start_positi on','end_position','strand'),filters='ensembl_gene_id',values= rownames(p12.ip$table),mart=ensembl68) ##### Warning message: In getBM(attributes = c("ensembl_gene_id", "external_gene_id", "hgnc_symbol", : Unable to match column names of BioMart output ########## > head(rownames(p12.ip$table)) [1] "ENSG00000111335" "ENSG00000165949" "ENSG00000187608" "ENSG00000157601" [5] "ENSG00000119922" "ENSG00000126709" ##### Many Thanks, Natasha SessionInfo: R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] biomaRt_2.14.0 gdata_2.12.0 WriteXLS_2.2.0 edgeR_2.6.10 limma_3.14.1 loaded via a namespace (and not attached): [1] gtools_2.7.0 RCurl_1.95-3 XML_3.95-0.1