biomaRt-chromosomal positions
1
0
Entering edit mode
@hoen-pac-t-hkg-1434
Last seen 9.6 years ago
Dear BioC I would like to use biomaRt to get entrez gene (or other) identifiers for small tag sequences. I use the getFeature function for this. It seems that it will retrieve the identifiers only when the chromosomal region indicated spans at least the complete length of the transcript, but not if the indicated chromosomal region contains only part of the transcript sequence. Is there a way aroud here? Code and sessionInfo: library(RCurl) library(biomaRt) ensembl = useMart("ensembl") ensembl = useMart("ensembl", dataset = "mmusculus_gene_ensembl") testgene <- getGene("66501", type = "entrezgene", mart = ensembl) testgene # entrezgene markersymbol #description chromosome_name band strand start_position end_position #1 66501 1700029H14Rik RIKEN cDNA 1700029H14 gene #[Source:MarkerSymbol;Acc:MGI:1913751] 8 A2 -1 #13550710 13562382 # ensembl_gene_id ensembl_transcript_id #1 ENSMUSG00000031452 ENSMUST00000033830 #this works fine: testfeatures = getFeature( type = "entrezgene", chromosome = "8", start = "13550710", end = "13562382",mart=ensembl) testfeatures # chromosome_name start_position end_position entrezgene #1 8 13550710 13562382 66501 #this does not work anymore testfeatures = getFeature( type = "entrezgene", chromosome = "8", start = "13550711", end = "13562381",mart=ensembl) testfeatures #NULL #I would like to have a result from a small tag in a query like this: testfeatures = getFeature( type = "entrezgene", chromosome = "8", start = "13550741", end = "13550761",mart=ensembl) sessionInfo() ---------------------- R version 2.5.0 (2007-04-23) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] "stats" "graphics" "grDevices" "datasets" "utils" "tcltk" "methods" "base" other attached packages: biomaRt RCurl XML svIO R2HTML svMisc svSocket svIDE "1.11.4" "0.8-0" "1.7-3" "0.9-5" "1.58" "0.9-5" "0.9-5" "0.9-5" Cheers, Peter-Bram
biomaRt biomaRt • 1.7k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
(Forwarded from Steffen Durinck on his request:) ------------------------------------------------- Dear Peter-Bram, Ensembl currenly uses the transcript level as basic unit of annotation and any feature on a smaller level cannot be retrieved without the transcript. What do you mean with small tag features? An alternative is that you would use the getBM function and retrieve for example all entrezgene ids on chromosome I together with their start and end locations and then check these results to see if your start and end positions are inbetween these positions. Would that help? Try: ensembl=useMart("ensembl", dataset="mmusculus_gene_ensembl") positions = getBM(c("entrezgene","start_position","end_position"), filters=c("chromosome_name","with_entrezgene"), values=list(1,TRUE), mart=ensembl) you'll get: > positions[1:5,] entrezgene start_position end_position 1 497097 3206103 3661429 2 19888 4334224 4350473 3 20671 4481009 4486494 4 18777 4797943 4836817 5 670320 4870130 4870732 and then you just use a vectorized comparison on this and to see where your positions fit in. Best regards, Steffen > -------- Messaggio Originale -------- > Oggetto: [BioC] biomaRt-chromosomal positions > Data: Thu, 16 Aug 2007 12:14:12 +0200 > Da: <p.a.c._t_hoen at="" lumc.nl=""> > A: <bioconductor at="" stat.math.ethz.ch=""> > > Dear BioC > > I would like to use biomaRt to get entrez gene (or other) identifiers > for small tag sequences. I use the getFeature function for this. It > seems that it will retrieve the identifiers only when the chromosomal > region indicated spans at least the complete length of the transcript, > but not if the indicated chromosomal region contains only part of the > transcript sequence. Is there a way aroud here? > > Code and sessionInfo: > > library(RCurl) > library(biomaRt) > ensembl = useMart("ensembl") > ensembl = useMart("ensembl", dataset = "mmusculus_gene_ensembl") > testgene <- getGene("66501", type = "entrezgene", mart = ensembl) > testgene > # entrezgene markersymbol > #description chromosome_name band strand start_position end_position > #1 66501 1700029H14Rik RIKEN cDNA 1700029H14 gene > #[Source:MarkerSymbol;Acc:MGI:1913751] 8 A2 -1 > #13550710 13562382 > # ensembl_gene_id ensembl_transcript_id > #1 ENSMUSG00000031452 ENSMUST00000033830 > > #this works fine: > testfeatures = getFeature( type = "entrezgene", chromosome = "8", start > = "13550710", end = "13562382",mart=ensembl) > testfeatures > # chromosome_name start_position end_position entrezgene > #1 8 13550710 13562382 66501 > > #this does not work anymore > testfeatures = getFeature( type = "entrezgene", chromosome = "8", start > = "13550711", end = "13562381",mart=ensembl) > testfeatures > #NULL > > #I would like to have a result from a small tag in a query like this: > testfeatures = getFeature( type = "entrezgene", chromosome = "8", start > = "13550741", end = "13550761",mart=ensembl) > > > sessionInfo() > ---------------------- > R version 2.5.0 (2007-04-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] "stats" "graphics" "grDevices" "datasets" "utils" "tcltk" > "methods" "base" > > other attached packages: > biomaRt RCurl XML svIO R2HTML svMisc svSocket svIDE > "1.11.4" "0.8-0" "1.7-3" "0.9-5" "1.58" "0.9-5" "0.9-5" "0.9-5" > > > > Cheers, > Peter-Bram >
ADD COMMENT

Login before adding your answer.

Traffic: 770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6