Ensembl mouse proteins
2
0
Entering edit mode
@stefanie-carola-gerstberger-4500
Last seen 9.6 years ago
Hi, I have tried to download the mouse protein sequences from Biomart Ensembl. I only received 2203 protein sequences for mouse, including isoforms. The same results from downloading the Ensembl protein sequences through UCSC genome browser.I also encounter the problem for Xenopus tropicalis - only 4700 protein sequences. As reference point S.cerevisae has 6700 sequences in Ensembl biomart, human 87,000, Drosophila 22,000. Does anyone know why this is and how I can circumvene this problem to get a complete list of protein sequences for mouse and Xenopus? Thanks, Stefanie
biomaRt biomaRt • 1.0k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 5 weeks ago
United States
What is the relationship of your question to bioconductor? Are you using R to perform the download? What functions in what packages, with what version? Read the posting guide, please, and provide result of sessionInfo(). On Sun, May 22, 2011 at 6:12 PM, Stefanie Carola Gerstberger <scg74 at="" cornell.edu=""> wrote: > Hi, > I have tried to download the mouse protein sequences from Biomart Ensembl. ?I only received 2203 protein sequences for mouse, including isoforms. The same results from downloading the Ensembl protein sequences through UCSC genome browser.I also encounter the problem for Xenopus tropicalis - only 4700 protein sequences. As reference point S.cerevisae has ?6700 sequences in Ensembl biomart, human 87,000, Drosophila 22,000. Does anyone know why this is and how I can circumvene this problem to get a complete list of protein sequences for mouse and Xenopus? > Thanks, > Stefanie > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 5 weeks ago
United States
Please keep dialogue on the list so others may learn. See below. On Sun, May 22, 2011 at 8:58 PM, Stefanie Gerstberger <stefanie.gerstberger at="" ymail.com=""> wrote: > Hi Vincent, > thanks for your reply. I had problems with biomaRt : >> sessionInfo() > R version 2.12.1 (2010-12-16) This is out of date. External services can't be used reliably with old versions of R. More below > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > other attached packages: > [1] biomaRt_2.6.0 ? ? Biostrings_2.18.2 IRanges_1.8.8 > loaded via a namespace (and not attached): > [1] Biobase_2.10.0 RCurl_1.4-3 ? ?tools_2.12.1 ? XML_3.2-0 >> >> library(biomaRt) >> ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >> protein = getSequence(id = "ENSG00000089280", type = "ensembl_gene_id", >> seqType = "peptide", mart = ensembl) > Error in getBM(c(seqType, type), filters = type, values = id, mart = mart, > ?: > ??Query ERROR: caught BioMart::Exception::Database: Could not connect to > mysql database ensembl_mart_62: DBI > connect('database=ensembl_mart_62;host=dcc-qa- db.oicr.on.ca;port=3306','bm_web',...) > failed: Can't connect to MySQL server on 'dcc-qa-db.oicr.on.ca' (113) at > /srv/biomart_server/biomart.org/biomart- perl/lib/BioMart/Configuration/DBLocation.pm > line 98 I was unable to reproduce this error with a properly update version of R/biomaRt. See further below >> protein = getSequence(id = c(100, 5728), type = "entrezgene", seqType = >> "peptide", mart = ensembl) > Error in getBM(c(seqType, type), filters = type, values = id, mart = mart, > ?: > ??Query ERROR: caught BioMart::Exception::Database: Could not connect to > mysql database ensembl_mart_62: DBI > connect('database=ensembl_mart_62;host=dcc-qa- db.oicr.on.ca;port=3306','bm_web',...) > failed: Can't connect to MySQL server on 'dcc-qa-db.oicr.on.ca' (113) at > /srv/biomart_server/biomart.org/biomart- perl/lib/BioMart/Configuration/DBLocation.pm > line 98 >> > that's I guess an internal ensembl problem. > However, I tried to circumvene this problem by just manually downloading the > mouse sequences at ensembl biomart server - I found that the files only > contained 4600 cDNA sequences or if downloading the peptide sequences I only I don't know what to say about this. However > mens = useMart("ensembl", dataset = "mmusculus_gene_ensembl") > p2 = getSequence(id = c(100, 5728), type = "entrezgene", seqType = "peptide", mart = mens) > dim(p2) [1] 0 2 > protein = getSequence(id = "ENSMUSG00000057573", type = "ensembl_gene_id", seqType = "peptide", mart = mens) > dim(protein) [1] 1 2 > protein = getSequence(id = "ENSMUSG00000066372", type = "ensembl_gene_id", seqType = "peptide", mart = mens) > dim(protein) [1] 1 2 > sessionInfo() R version 2.13.0 Patched (2011-04-14 r55443) Platform: x86_64-apple-darwin10.6.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] biomaRt_2.8.0 weaver_1.17.0 codetools_0.2-8 digest_0.4.2 loaded via a namespace (and not attached): [1] RCurl_1.5-0 XML_3.2-0 > received 2300 sequences. I translated the 4600 sequences using Biostrings > but quite a bit of sequences contain undefined nucleotides and no ATG start > codon or are ending in frameshift. But I'm very confused about receiving > only 4600 cDNA sequences. > I know this part is not really for the Bioconductor list but I was hoping > that someone with experience with the ensembl mouse genome knows why I'm > encountering this - and whether there is a way in Bioconductor to download > the sequences not using Biomart. I have found a way now around it ?- by > simply ignoring ensembl and using refseq proteins downloaded from UCSC. > Using BiomaRt in R seemed to me the simplest solution to obtain the > sequences - I don't currently know any other option. > Thanks, > Stefanie > > > > > > > ________________________________ > Von: Vincent Carey <stvjc at="" channing.harvard.edu=""> > An: Stefanie Carola Gerstberger <scg74 at="" cornell.edu=""> > CC: "Bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Gesendet: Sonntag, den 22. Mai 2011, 19:29:08 Uhr > Betreff: Re: [BioC] Ensembl mouse proteins > > What is the relationship of your question to bioconductor?? Are you > using R to perform the download?? What functions in what packages, > with > what version?? Read the posting guide, please, and provide result of > sessionInfo(). > > On Sun, May 22, 2011 at 6:12 PM, Stefanie Carola Gerstberger > <scg74 at="" cornell.edu=""> wrote: >> Hi, >> I have tried to download the mouse protein sequences from Biomart Ensembl. >> ?I only received 2203 protein sequences for mouse, including isoforms. The >> same results from downloading the Ensembl protein sequences through UCSC >> genome browser.I also encounter the problem for Xenopus tropicalis - only >> 4700 protein sequences. As reference point S.cerevisae has ?6700 sequences >> in Ensembl biomart, human 87,000, Drosophila 22,000. Does anyone know why >> this is and how I can circumvene this problem to get a complete list of >> protein sequences for mouse and Xenopus? >> Thanks, >> Stefanie >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD COMMENT

Login before adding your answer.

Traffic: 783 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6