Fwd: biomaRt column order
1
0
Entering edit mode
@steffenstatberkeleyedu-2907
Last seen 10.2 years ago
Hi Mark, The main problem here is that attributes from different attribute pages are retrieved and this is not supported by the webservice though such queries are possible and useful especially for what we do in Bioconductor. To get an idea what attribute pages are you could check out the BioMart web interfaces at e.g. http://www.ensembl.org They are there to group attributes of a similar type together and display in one webpage ...this makes less sense for command line use like biomaRt. The column names are returned by the webservice so this problem will have to be solved there. Though by using the attributes for chromosome_name and ensembl_gene_id from the sequence attribute page the query should return the column names correctly. To see with biomaRt all attributes that belong to one page you could do: listAttributes(mart, category="Sequences") If you change your query as follows the column names should be in correct order: b<-getBM(c("sequence_gene_stable_id","sequence_str_chrom_name", "sequence_biotype","sequence_exon_chrom_start","sequence_exon_chrom_en d") ,filters="ensembl_gene_id",values="ENSG00000197530",mart=mart) You'll get: gene_stable_id str_chrom_name struct_biotype exon_chrom_start exon_chrom_end 1 ENSG00000197530 1 protein_coding 1540747 1540876 2 ENSG00000197530 1 protein_coding 1541751 1541857 3 ENSG00000197530 1 protein_coding 1548632 1548942 4 ENSG00000197530 1 protein_coding 1549017 1549188 Cheers, Steffen > > > Begin forwarded message: > >> From: Mark Robinson <mrobinson at="" wehi.edu.au=""> >> Date: 5 July 2008 9:13:48 AM >> To: bioconductor at stat.math.ethz.ch >> Subject: [BioC] biomaRt column order >> >> Dear list. >> >> I'm using biomaRt to do a fairly simple query against the Ensembl >> human database. I get returned a table with column names that don't >> match the data in the columns. See below. >> >> I can reshuffle them afterwards to make them, but thats not ideal. >> >> Am I doing something wrong? >> >> Thanks, >> Mark >> >> >> >> >> > library(biomaRt) >> Loading required package: RCurl >> > mart=useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl") >> Checking attributes and filters ... ok >> > mart >> Object of class 'Mart': >> Using the ensembl BioMart database >> Using the hsapiens_gene_ensembl dataset >> > b<- >> getBM >> (c >> ("ensembl_gene_id >> ","chromosome_name >> ","sequence_biotype >> ","sequence_exon_chrom_start >> ","sequence_exon_chrom_end >> "),filters="ensembl_gene_id",values="ENSG00000197530",mart=mart) >> > dim(b) >> [1] 25 5 >> > b[1:10,] >> ensembl_gene_id chromosome_name struct_biotype exon_chrom_start >> 1 protein_coding 1542803 1542958 ENSG00000197530 >> 2 protein_coding 1548674 1548942 ENSG00000197530 >> 3 protein_coding 1549017 1549188 ENSG00000197530 >> 4 protein_coding 1550038 1550144 ENSG00000197530 >> 5 protein_coding 1550234 1550428 ENSG00000197530 >> 6 protein_coding 1550529 1550671 ENSG00000197530 >> 7 protein_coding 1551893 1551997 ENSG00000197530 >> 8 protein_coding 1552080 1552242 ENSG00000197530 >> 9 protein_coding 1552317 1552450 ENSG00000197530 >> 10 protein_coding 1552539 1552687 ENSG00000197530 >> exon_chrom_end >> 1 1 >> 2 1 >> 3 1 >> 4 1 >> 5 1 >> 6 1 >> 7 1 >> 8 1 >> 9 1 >> 10 1 >> > sessionInfo() >> R version 2.7.1 (2008-06-23) >> i386-apple-darwin8.10.1 >> >> locale: >> en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] biomaRt_1.14.0 RCurl_0.9-3 >> >> loaded via a namespace (and not attached): >> [1] XML_1.95-2 >> >> >> >> ------------------------------ >> Mark Robinson >> Epigenetics Laboratory, Garvan >> Bioinformatics Division, WEHI >> e: m.robinson at garvan.org.au >> e: mrobinson at wehi.edu.au >> p: +61 (0)3 9345 2628 >> f: +61 (0)3 9347 0852 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > ------------------------------ > Mark Robinson > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robinson at garvan.org.au > e: mrobinson at wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ > > > >
Epigenetics biomaRt Epigenetics biomaRt • 1.3k views
ADD COMMENT
0
Entering edit mode
@steffenstatberkeleyedu-2907
Last seen 10.2 years ago
Looks like my reply on the biomaRt column order didn't make it to the bioc mailing list. ---------------------------- Original Message ---------------------------- Subject: Re: Fwd: [BioC] biomaRt column order From: steffen@stat.berkeley.edu Date: Thu, July 10, 2008 9:13 pm To: "Mark Robinson" <mrobinson at="" wehi.edu.au=""> Cc: bioconductor at stat.math.ethz.ch ---------------------------------------------------------------------- ---- Hi Mark, The main problem here is that attributes from different attribute pages are retrieved and this is not supported by the webservice though such queries are possible and useful especially for what we do in Bioconductor. To get an idea what attribute pages are you could check out the BioMart web interfaces at e.g. http://www.ensembl.org They are there to group attributes of a similar type together and display in one webpage ...this makes less sense for command line use like biomaRt. The column names are returned by the webservice so this problem will have to be solved there. Though by using the attributes for chromosome_name and ensembl_gene_id from the sequence attribute page the query should return the column names correctly. To see with biomaRt all attributes that belong to one page you could do: listAttributes(mart, category="Sequences") If you change your query as follows the column names should be in correct order: b<-getBM(c("sequence_gene_stable_id","sequence_str_chrom_name", "sequence_biotype","sequence_exon_chrom_start","sequence_exon_chrom_en d") ,filters="ensembl_gene_id",values="ENSG00000197530",mart=mart) You'll get: gene_stable_id str_chrom_name struct_biotype exon_chrom_start exon_chrom_end 1 ENSG00000197530 1 protein_coding 1540747 1540876 2 ENSG00000197530 1 protein_coding 1541751 1541857 3 ENSG00000197530 1 protein_coding 1548632 1548942 4 ENSG00000197530 1 protein_coding 1549017 1549188 Cheers, Steffen > > > Begin forwarded message: > >> From: Mark Robinson <mrobinson at="" wehi.edu.au=""> >> Date: 5 July 2008 9:13:48 AM >> To: bioconductor at stat.math.ethz.ch >> Subject: [BioC] biomaRt column order >> >> Dear list. >> >> I'm using biomaRt to do a fairly simple query against the Ensembl >> human database. I get returned a table with column names that don't >> match the data in the columns. See below. >> >> I can reshuffle them afterwards to make them, but thats not ideal. >> >> Am I doing something wrong? >> >> Thanks, >> Mark >> >> >> >> >> > library(biomaRt) >> Loading required package: RCurl >> > mart=useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl") >> Checking attributes and filters ... ok >> > mart >> Object of class 'Mart': >> Using the ensembl BioMart database >> Using the hsapiens_gene_ensembl dataset >> > b<- >> getBM >> (c >> ("ensembl_gene_id >> ","chromosome_name >> ","sequence_biotype >> ","sequence_exon_chrom_start >> ","sequence_exon_chrom_end >> "),filters="ensembl_gene_id",values="ENSG00000197530",mart=mart) >> > dim(b) >> [1] 25 5 >> > b[1:10,] >> ensembl_gene_id chromosome_name struct_biotype exon_chrom_start >> 1 protein_coding 1542803 1542958 ENSG00000197530 >> 2 protein_coding 1548674 1548942 ENSG00000197530 >> 3 protein_coding 1549017 1549188 ENSG00000197530 >> 4 protein_coding 1550038 1550144 ENSG00000197530 >> 5 protein_coding 1550234 1550428 ENSG00000197530 >> 6 protein_coding 1550529 1550671 ENSG00000197530 >> 7 protein_coding 1551893 1551997 ENSG00000197530 >> 8 protein_coding 1552080 1552242 ENSG00000197530 >> 9 protein_coding 1552317 1552450 ENSG00000197530 >> 10 protein_coding 1552539 1552687 ENSG00000197530 >> exon_chrom_end >> 1 1 >> 2 1 >> 3 1 >> 4 1 >> 5 1 >> 6 1 >> 7 1 >> 8 1 >> 9 1 >> 10 1 >> > sessionInfo() >> R version 2.7.1 (2008-06-23) >> i386-apple-darwin8.10.1 >> >> locale: >> en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] biomaRt_1.14.0 RCurl_0.9-3 >> >> loaded via a namespace (and not attached): >> [1] XML_1.95-2 >> >> >> >> ------------------------------ >> Mark Robinson >> Epigenetics Laboratory, Garvan >> Bioinformatics Division, WEHI >> e: m.robinson at garvan.org.au >> e: mrobinson at wehi.edu.au >> p: +61 (0)3 9345 2628 >> f: +61 (0)3 9347 0852 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > ------------------------------ > Mark Robinson > Epigenetics Laboratory, Garvan > Bioinformatics Division, WEHI > e: m.robinson at garvan.org.au > e: mrobinson at wehi.edu.au > p: +61 (0)3 9345 2628 > f: +61 (0)3 9347 0852 > ------------------------------ > > > >
ADD COMMENT

Login before adding your answer.

Traffic: 600 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6