biomaRt- incorrect number of transcripts
0
0
Entering edit mode
@ivanek-robert-3765
Last seen 10.2 years ago
Dear mailing list, I have recently observed a discrepancies in genome annotation obtained via R package biomaRt. I wanted to download all ensembl transcripts from the entire mouse genome (chromosome 1:19, X, Y MT only). When I set the filter based on chromosome names I retrieved ~36000 transcript, please see the code below. However by using the web service www.biomart.org I received ~48000 transcripts for the same genome version and chromosomes. By comparing these two data frames you could see that the discrepancies in number of transcripts occur only for some chromosomes (3:9 and X). If I specified only two chromosome names (2 and 3) than the number of downloaded transcripts is correct for both of them. If I did not set any filter in getBM function and did the filtering manually in R, the number of transcripts is correct. Session info is attached. Best Regards Robert -- Robert Ivanek Postdoctoral Fellow Schuebeler Group Friedrich Miescher Institute Maulbeerstrasse 66 4058 Basel / Switzerland Office phone: +41 61 697 6100 R> library("biomaRt") R> ensembl <- useMart("ensembl", dataset = "mmusculus_gene_ensembl") R> chroms <- c(1:19,"X","Y","MT") R> table(getBM(attributes = c("ensembl_transcript_id", "chromosome_name", "strand", "transcript_start", "transcript_end"), filters = "chromosome_name", values = chroms, mart = ensembl)$chromosome_name) 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 MT X Y 2507 1869 4364 1501 1630 1624 1404 1522 1865 985 1245 5232 1080 1454 845 1209 1487 1129 1031 41 2072 17 R> ens.web <- read.delim("../../../mart_export.txt",stringsAsFactors=F) R> ens.web <- ens.web[ens.web$Chromosome.Name %in% chroms,] R> table(ens.web$Chromosome.Name) 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 MT X Y 2507 1869 4364 1501 1630 1624 1404 1522 1865 985 1245 5232 2179 3997 2822 2524 3919 2021 2163 41 3297 17 R> table(getBM(attributes = c("ensembl_transcript_id", "chromosome_name", "strand", "transcript_start", "transcript_end"), filters = "chromosome_name", values = c("2","3","MT"), mart = ensembl)$chromosome_name) 2 3 MT 5232 2179 41 R> ens.r <- getBM(attributes = c("ensembl_transcript_id", "chromosome_name", "strand", "transcript_start", "transcript_end"), mart = ensembl) R> ens.r <- ens.r[ens.r$chromosome_name %in% chroms,] R> table(ens.r$chromosome_name) 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 MT X Y 2507 1869 4364 1501 1630 1624 1404 1522 1865 985 1245 5232 2179 3997 2822 2524 3919 2021 2163 41 3297 17 R> sessionInfo() R version 2.10.0 (2009-10-26) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=C [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.2.0 loaded via a namespace (and not attached): [1] RCurl_1.3-0 tools_2.10.0 XML_2.6-0
Annotation biomaRt Annotation biomaRt • 952 views
ADD COMMENT

Login before adding your answer.

Traffic: 910 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6