issue of genome build versions when using biomaRt
3
0
Entering edit mode
Al Tango ▴ 50
@al-tango-3109
Last seen 9.6 years ago
Hi all, Although seems a frequently asked question, I didn't find it in archives. When specify chromosomal coordinates for a region in using biomaRt or other BioC packages, how can I know the version of genome assembly being retrieved, and is it possible to define a particular version to use? eg, I am searching for 5'UTR sequence of gene(s) within a region this way: ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") getSequence(chromosome=3, start=185514033, end=185535839, type="entrezgene", seqType="5utr", mart=ensembl) My questions: does it treat the start/end coordinates as in the latest version of builld 36 (2006)? can I opt for build 35 or hg17 (2004)? Thanks for your help in advance.
biomaRt biomaRt • 2.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States
Hi Al, The list archives are your friend. A search for biomart ensembl version gave me this as the second hit: http://article.gmane.org/gmane.science.biology.informatics.conductor/1 5311/match=biomart+ensembl+version Best, Jim Al Tango wrote: > Hi all, Although seems a frequently asked question, I didn't find it > in archives. > > When specify chromosomal coordinates for a region in using biomaRt or > other BioC packages, how can I know the version of genome assembly > being retrieved, and is it possible to define a particular version to > use? > > eg, I am searching for 5'UTR sequence of gene(s) within a region this way: > > ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") > getSequence(chromosome=3, start=185514033, end=185535839, > type="entrezgene", seqType="5utr", mart=ensembl) > > My questions: does it treat the start/end coordinates as in the latest > version of builld 36 (2006)? can I opt for build 35 or hg17 (2004)? > > Thanks for your help in advance. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 17 hours ago
Seattle, WA, United States
Hi Al, You also have the option to retrieve those sequences from the appropriate BSgenome data package: library(BSgenome.Hsapiens.UCSC.hg18) getSeq(Hsapiens, "chr3", start=185514033, end=185535839) The hg18 genome is NCBI Build 36.1 (http://genome.ucsc.edu/FAQ/FAQreleases) Cheers, H. Al Tango wrote: > Hi all, Although seems a frequently asked question, I didn't find it > in archives. > > When specify chromosomal coordinates for a region in using biomaRt or > other BioC packages, how can I know the version of genome assembly > being retrieved, and is it possible to define a particular version to > use? > > eg, I am searching for 5'UTR sequence of gene(s) within a region this way: > > ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") > getSequence(chromosome=3, start=185514033, end=185535839, > type="entrezgene", seqType="5utr", mart=ensembl) > > My questions: does it treat the start/end coordinates as in the latest > version of builld 36 (2006)? can I opt for build 35 or hg17 (2004)? > > Thanks for your help in advance. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
@joern-toedling-1244
Last seen 9.6 years ago
Hello, if you use the Ensembl biomart, you get access to the genome builds as they are/were current in that release of Ensembl. For example, in the current Ensembl release (50), it is NCBI36 for H.sapiens. You can see which versions of genome builds are associated with that release by: ensembl <- useMart("ensembl") listDatasets(ensembl) In some cases, you can get older versions of genome builds, by using marts of archived, previous Ensembl releases. Which archive marts are available, you can see by listMarts(archive=TRUE) For example, ensembl43 <- useMart("ensembl_mart_43", archive=TRUE) listDatasets(ensembl43) shows you the genome builds in Ensembl release 43. However, there does not seem to be a very old archive mart that would allow you to access NCBI35 for H.sapiens. Someone please correct me if they know better. So I am afraid that you will have to resort to other sources for the UTR sequences in NCBI35. Best regards, Joern Al Tango wrote: > Hi all, Although seems a frequently asked question, I didn't find it > in archives. > > When specify chromosomal coordinates for a region in using biomaRt or > other BioC packages, how can I know the version of genome assembly > being retrieved, and is it possible to define a particular version to > use? > > eg, I am searching for 5'UTR sequence of gene(s) within a region this way: > > ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") > getSequence(chromosome=3, start=185514033, end=185535839, > type="entrezgene", seqType="5utr", mart=ensembl) > > My questions: does it treat the start/end coordinates as in the latest > version of builld 36 (2006)? can I opt for build 35 or hg17 (2004)? > > Thanks for your help in advance. > > -- Joern Toedling EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom Phone +44(0)1223 492566 Email toedling at ebi.ac.uk
ADD COMMENT

Login before adding your answer.

Traffic: 996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6