biomaRt ensembl mmusculus does not contain all ensembl IDs (lincRNA, miRNA etc)?
1
0
Entering edit mode
Duke ▴ 210
@duke-4050
Last seen 9.6 years ago
Hi folks, Following instruction of biomaRt usage, I am trying to get information for our mmu data. The code I used was below: ---------- library(biomaRt) mart<- useDataset("mmusculus_gene_ensembl", useMart("ensembl")) ensTransIDs <- c("ENSMUST00000000001", "ENSMUST00000083463","ENSMUST00000042585") getBM(filters="ensembl_transcript_id", attributes=c("ensembl_transcript_id","ensembl_gene_id", "external_transcript_id", "external_gene_id", "refseq_dna", "entrezgene"), values=ensTransIDs,mart= mart) ---------- This code runs fine with some transcript_ids, but for some of others (for example, lincRNAs or miRNAs), it gave empty results. For example, the code above for one gene, one lincRNA and one miRNA produced result: ensembl_transcript_id ensembl_gene_id external_transcript_id 1 ENSMUST00000000001 ENSMUSG00000000001 Gnai3-001 external_gene_id refseq_dna entrezgene 1 Gnai3 NM_010306 14679 => only gene Gnai3 is detected, the other two are not. Anybody knows what I am doing wrong here, or it is just the database in ensembl does not contain all the available transcript_id data? For the record, here is my sessionInfo(): > sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.6.0 loaded via a namespace (and not attached): [1] RCurl_1.4-3 XML_3.2-0 tools_2.12.2 Thanks, D.
miRNA biomaRt miRNA biomaRt • 2.1k views
ADD COMMENT
0
Entering edit mode
@steffen-durinck-4465
Last seen 9.6 years ago
Hi Duke, It looks like this is a BioMart server issue where the wrong type of table join is made with the entezgene table. If you remove the entrezgene attribute you'll get everything back: > getBM(filters="ensembl_transcript_id", attributes=c("ensembl_transcr ipt_id","ensembl_gene_id","external_transcript_id","refseq_dna"), values=ensTransIDs,mart= mart) ensembl_transcript_id ensembl_gene_id external_transcript_id refseq_dna 1 ENSMUST00000000001 ENSMUSG00000000001 Gnai3-001 NM_010306 2 ENSMUST00000042585 ENSMUSG00000037982 Gm9725-201 3 ENSMUST00000083463 ENSMUSG00000065397 Mir155-201 NR_029565 We notified the BioMart team of this behavior a while ago and they would make a change in the next release. Cheers, Steffen On Mon, Apr 18, 2011 at 1:33 PM, Duke <duke.lists at="" gmx.com=""> wrote: > Hi folks, > > Following instruction of biomaRt usage, I am trying to get information for > our mmu data. The code I used was below: > > ---------- > library(biomaRt) > mart<- useDataset("mmusculus_gene_ensembl", useMart("ensembl")) > ensTransIDs <- c("ENSMUST00000000001", > "ENSMUST00000083463","ENSMUST00000042585") > getBM(filters="ensembl_transcript_id", > attributes=c("ensembl_transcript_id","ensembl_gene_id", > "external_transcript_id", "external_gene_id", "refseq_dna", "entrezgene"), > values=ensTransIDs,mart= mart) > ---------- > > This code runs fine with some transcript_ids, but for some of others (for > example, lincRNAs or miRNAs), it gave empty results. For example, the code > above for one gene, one lincRNA and one miRNA produced result: > > ?ensembl_transcript_id ? ?ensembl_gene_id external_transcript_id > 1 ? ?ENSMUST00000000001 ENSMUSG00000000001 ? ? ? ? ? ? ?Gnai3-001 > ?external_gene_id refseq_dna entrezgene > 1 ? ? ? ? ? ?Gnai3 ?NM_010306 ? ? ?14679 > > > => only gene Gnai3 is detected, the other two are not. > > Anybody knows what I am doing wrong here, or it is just the database in > ensembl does not contain all the available transcript_id data? > > For the record, here is my sessionInfo(): > >> sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] biomaRt_2.6.0 > > loaded via a namespace (and not attached): > [1] RCurl_1.4-3 ?XML_3.2-0 ? ?tools_2.12.2 > > Thanks, > > D. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Hi Steffen, Thanks so much for quick response. Yes, removing entrezgene does help! Bests, D. On 4/18/11 4:41 PM, Steffen Durinck wrote: > Hi Duke, > > It looks like this is a BioMart server issue where the wrong type of > table join is made with the entezgene table. > If you remove the entrezgene attribute you'll get everything back: > >> getBM(filters="ensembl_transcript_id", attributes=c("ensembl_transc ript_id","ensembl_gene_id","external_transcript_id","refseq_dna"), values=ensTransIDs,mart= mart) > ensembl_transcript_id ensembl_gene_id external_transcript_id refseq_dna > 1 ENSMUST00000000001 ENSMUSG00000000001 Gnai3-001 NM_010306 > 2 ENSMUST00000042585 ENSMUSG00000037982 Gm9725-201 > 3 ENSMUST00000083463 ENSMUSG00000065397 Mir155-201 NR_029565 > > > We notified the BioMart team of this behavior a while ago and they > would make a change in the next release. > > Cheers, > Steffen > > > > On Mon, Apr 18, 2011 at 1:33 PM, Duke<duke.lists at="" gmx.com=""> wrote: >> Hi folks, >> >> Following instruction of biomaRt usage, I am trying to get information for >> our mmu data. The code I used was below: >> >> ---------- >> library(biomaRt) >> mart<- useDataset("mmusculus_gene_ensembl", useMart("ensembl")) >> ensTransIDs<- c("ENSMUST00000000001", >> "ENSMUST00000083463","ENSMUST00000042585") >> getBM(filters="ensembl_transcript_id", >> attributes=c("ensembl_transcript_id","ensembl_gene_id", >> "external_transcript_id", "external_gene_id", "refseq_dna", "entrezgene"), >> values=ensTransIDs,mart= mart) >> ---------- >> >> This code runs fine with some transcript_ids, but for some of others (for >> example, lincRNAs or miRNAs), it gave empty results. For example, the code >> above for one gene, one lincRNA and one miRNA produced result: >> >> ensembl_transcript_id ensembl_gene_id external_transcript_id >> 1 ENSMUST00000000001 ENSMUSG00000000001 Gnai3-001 >> external_gene_id refseq_dna entrezgene >> 1 Gnai3 NM_010306 14679 >> >> >> => only gene Gnai3 is detected, the other two are not. >> >> Anybody knows what I am doing wrong here, or it is just the database in >> ensembl does not contain all the available transcript_id data? >> >> For the record, here is my sessionInfo(): >> >>> sessionInfo() >> R version 2.12.2 (2011-02-25) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] biomaRt_2.6.0 >> >> loaded via a namespace (and not attached): >> [1] RCurl_1.4-3 XML_3.2-0 tools_2.12.2 >> >> Thanks, >> >> D. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>
ADD REPLY
0
Entering edit mode
Hi Steffen and Duke The issue here is that the entrezgene external references are currently stored on translations and as only one of the transcripts you uploaded has a translation, this is the only one that you will get back when you add the entrezgene attribute. Basically the entrezgene attribute is acting like a filter, which is not ideal. Unfortunately we cannot do anything about this problem at the moment as the BioMart tool we use to build the mart does not allow the addition of a necessary left join. We have informed the BioMart developers at the OICR about this issue and hopefully it will be fixed in the new code. On the plus side, the entrezgene IDs will be stored on genes for release 63 (due approx end of June) so you should be able to use this attribute in the expected way after the next release. I apologize for any inconvenience that this has caused. If I can be of further assistance, please let me know. Regards Rhoda On 18 Apr 2011, at 21:41, Steffen Durinck wrote: > Hi Duke, > > It looks like this is a BioMart server issue where the wrong type of > table join is made with the entezgene table. > If you remove the entrezgene attribute you'll get everything back: > >> getBM(filters="ensembl_transcript_id", >> attributes >> = >> c >> ("ensembl_transcript_id >> ","ensembl_gene_id","external_transcript_id","refseq_dna"), >> values=ensTransIDs,mart= mart) > ensembl_transcript_id ensembl_gene_id external_transcript_id > refseq_dna > 1 ENSMUST00000000001 ENSMUSG00000000001 Gnai3-001 > NM_010306 > 2 ENSMUST00000042585 ENSMUSG00000037982 Gm9725-201 > 3 ENSMUST00000083463 ENSMUSG00000065397 Mir155-201 > NR_029565 > > > We notified the BioMart team of this behavior a while ago and they > would make a change in the next release. > > Cheers, > Steffen > > > > On Mon, Apr 18, 2011 at 1:33 PM, Duke <duke.lists at="" gmx.com=""> wrote: >> Hi folks, >> >> Following instruction of biomaRt usage, I am trying to get >> information for >> our mmu data. The code I used was below: >> >> ---------- >> library(biomaRt) >> mart<- useDataset("mmusculus_gene_ensembl", useMart("ensembl")) >> ensTransIDs <- c("ENSMUST00000000001", >> "ENSMUST00000083463","ENSMUST00000042585") >> getBM(filters="ensembl_transcript_id", >> attributes=c("ensembl_transcript_id","ensembl_gene_id", >> "external_transcript_id", "external_gene_id", "refseq_dna", >> "entrezgene"), >> values=ensTransIDs,mart= mart) >> ---------- >> >> This code runs fine with some transcript_ids, but for some of >> others (for >> example, lincRNAs or miRNAs), it gave empty results. For example, >> the code >> above for one gene, one lincRNA and one miRNA produced result: >> >> ensembl_transcript_id ensembl_gene_id external_transcript_id >> 1 ENSMUST00000000001 ENSMUSG00000000001 Gnai3-001 >> external_gene_id refseq_dna entrezgene >> 1 Gnai3 NM_010306 14679 >> >> >> => only gene Gnai3 is detected, the other two are not. >> >> Anybody knows what I am doing wrong here, or it is just the >> database in >> ensembl does not contain all the available transcript_id data? >> >> For the record, here is my sessionInfo(): >> >>> sessionInfo() >> R version 2.12.2 (2011-02-25) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] biomaRt_2.6.0 >> >> loaded via a namespace (and not attached): >> [1] RCurl_1.4-3 XML_3.2-0 tools_2.12.2 >> >> Thanks, >> >> D. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Rhoda Kinsella Ph.D. Ensembl Bioinformatician, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK.
ADD REPLY

Login before adding your answer.

Traffic: 662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6