biomaRt returning multiple columns out of order
1
0
Entering edit mode
@richard-hayes-4887
Last seen 10.3 years ago
Hi, Our group maintains the biomart instance at the Phytozome plant genomics portal. We've had some users report problems with the result sets from the biomaRt interface. It is unclear if this is a biomaRt problem or a problem in our mart configuration. At the moment, we are still running biomart version 0.6, but are hoping to upgrade in the very near future to 0.7. I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to R 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software releases. I can successfully connect to our mart and the main genome transcript dataset as follows, successfully retrieving a single column of transcript names for Arabidopsis thaliana using our internal "orgid" filter for organism ID 167: > library('biomaRt') > phyto=useMart('phytozome_mart', dataset='phytozome') > transcripts = getBM(attributes = c("transcript_name"), filters= "orgid", values="167", mart=phyto) > transcripts[1:5,] [1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1" However, when I construct a multicolumn query, the columns are not returned in the expected order: > multiTest = getBM(attributes= c("organism_name", "transcript_name", "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167", mart=phyto) > multiTest[1:5,] organism_name transcript_name exon_chrom_start exon_chrom_end 1 AT5G47220.1 19171862 19172823 Athaliana 2 AT1G71920.3 27067059 27067098 Athaliana 3 AT1G71920.3 27067189 27067401 Athaliana 4 AT1G71920.3 27067506 27067589 Athaliana 5 AT1G71920.3 27067706 27067860 Athaliana Any help diagnosing the source of this problem is much appreciated. Best regards, -- Richard D. Hayes, Ph.D. Joint Genome Institute / Lawrence Berkeley National Lab http://www.phytozome.net [[alternative HTML version deleted]]
Arabidopsis thaliana biomaRt Arabidopsis thaliana biomaRt • 1.3k views
ADD COMMENT
0
Entering edit mode
@arek-kasprzyk-4891
Last seen 10.3 years ago
Hi Richard, the best person to help you is Steffen Durinck, the original biomaRt coder (cc'ed on this email) a On Wed, Sep 28, 2011 at 3:52 PM, Richard Hayes <rdhayes@lbl.gov> wrote: > Hi, > > Our group maintains the biomart instance at the Phytozome plant genomics > portal. We've had some users report problems with the result sets from the > biomaRt interface. It is unclear if this is a biomaRt problem or a problem > in our mart configuration. At the moment, we are still running biomart > version 0.6, but are hoping to upgrade in the very near future to 0.7. > > I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to R > 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software > releases. > > I can successfully connect to our mart and the main genome transcript > dataset as follows, successfully retrieving a single column of transcript > names for Arabidopsis thaliana using our internal "orgid" filter for > organism ID 167: > > > library('biomaRt') > > phyto=useMart('phytozome_mart', dataset='phytozome') > > transcripts = getBM(attributes = c("transcript_name"), filters= "orgid", > values="167", mart=phyto) > > transcripts[1:5,] > [1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1" > > However, when I construct a multicolumn query, the columns are not returned > in the expected order: > > > multiTest = getBM(attributes= c("organism_name", "transcript_name", > "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167", > mart=phyto) > > multiTest[1:5,] > organism_name transcript_name exon_chrom_start exon_chrom_end > 1 AT5G47220.1 19171862 19172823 Athaliana > 2 AT1G71920.3 27067059 27067098 Athaliana > 3 AT1G71920.3 27067189 27067401 Athaliana > 4 AT1G71920.3 27067506 27067589 Athaliana > 5 AT1G71920.3 27067706 27067860 Athaliana > > Any help diagnosing the source of this problem is much appreciated. > > Best regards, > > -- > Richard D. Hayes, Ph.D. > Joint Genome Institute / Lawrence Berkeley National Lab > http://www.phytozome.net > > _______________________________________________ > Users mailing list > Users@biomart.org > https://lists.biomart.org/mailman/listinfo/users > > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi RIchard, Arek, If you set verbose=TRUE in your getBM query you'll see the XML query that is send to the BioMart server (see below for your example). The order of the attributes in the XML query is usually the same order we get the results back from the BioMart server. However for your example this is not the case and there is no way for biomaRt to know this (Arek correct me if this is not the case), so when we add column names to the returned matrix they will be wrong when the query order is not preserved in the returned result. > multiTest = getBM(attributes= c("organism_name", "transcript_name", "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167", mart=phyto,verbose=TRUE) <query virtualschemaname="default" uniquerows="1" count="0" datasetconfigversion="0.6" requestid="biomaRt"> <dataset name="phytozome"><attribute name="organism_name"/><attribute name="transcript_name"/><attribute name="exon_chrom_start"/><attribute name="exon_chrom_end"/><filter name="orgid" value="167"/></dataset></query> Cheers, Steffen On Thu, Sep 29, 2011 at 9:08 AM, Arek Kasprzyk <arek.kasprzyk@gmail.com>wrote: > Hi Richard, > the best person to help you is Steffen Durinck, the original biomaRt coder > (cc'ed on this email) > > a > > On Wed, Sep 28, 2011 at 3:52 PM, Richard Hayes <rdhayes@lbl.gov> wrote: > >> Hi, >> >> Our group maintains the biomart instance at the Phytozome plant genomics >> portal. We've had some users report problems with the result sets from the >> biomaRt interface. It is unclear if this is a biomaRt problem or a problem >> in our mart configuration. At the moment, we are still running biomart >> version 0.6, but are hoping to upgrade in the very near future to 0.7. >> >> I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to R >> 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software >> releases. >> >> I can successfully connect to our mart and the main genome transcript >> dataset as follows, successfully retrieving a single column of transcript >> names for Arabidopsis thaliana using our internal "orgid" filter for >> organism ID 167: >> >> > library('biomaRt') >> > phyto=useMart('phytozome_mart', dataset='phytozome') >> > transcripts = getBM(attributes = c("transcript_name"), filters= "orgid", >> values="167", mart=phyto) >> > transcripts[1:5,] >> [1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1" >> >> However, when I construct a multicolumn query, the columns are not >> returned in the expected order: >> >> > multiTest = getBM(attributes= c("organism_name", "transcript_name", >> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167", >> mart=phyto) >> > multiTest[1:5,] >> organism_name transcript_name exon_chrom_start exon_chrom_end >> 1 AT5G47220.1 19171862 19172823 Athaliana >> 2 AT1G71920.3 27067059 27067098 Athaliana >> 3 AT1G71920.3 27067189 27067401 Athaliana >> 4 AT1G71920.3 27067506 27067589 Athaliana >> 5 AT1G71920.3 27067706 27067860 Athaliana >> >> Any help diagnosing the source of this problem is much appreciated. >> >> Best regards, >> >> -- >> Richard D. Hayes, Ph.D. >> Joint Genome Institute / Lawrence Berkeley National Lab >> http://www.phytozome.net >> >> _______________________________________________ >> Users mailing list >> Users@biomart.org >> https://lists.biomart.org/mailman/listinfo/users >> >> > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6