Entering edit mode
steffen@stat.Berkeley.EDU
▴
600
@steffenstatberkeleyedu-2907
Last seen 10.2 years ago
---------------------------- Original Message
----------------------------
Subject: Re: Fwd: [BioC] biomaRt column order
From: steffen@stat.berkeley.edu
Date: Thu, July 10, 2008 9:13 pm
To: "Mark Robinson" <mrobinson at="" wehi.edu.au="">
Cc: bioconductor at stat.math.ethz.ch
----------------------------------------------------------------------
----
Hi Mark,
The main problem here is that attributes from different attribute
pages
are retrieved and this is not supported by the webservice though such
queries are possible and useful especially for what we do in
Bioconductor.
To get an idea what attribute pages are you could check out the
BioMart
web interfaces at e.g. http://www.ensembl.org
They are there to group attributes of a similar type together and
display
in one webpage ...this makes less sense for command line use like
biomaRt.
The column names are returned by the webservice so this problem will
have
to be solved there. Though by using the attributes for
chromosome_name
and ensembl_gene_id from the sequence attribute page the query should
return the column names correctly.
To see with biomaRt all attributes that belong to one page you could
do:
listAttributes(mart, category="Sequences")
If you change your query as follows the column names should be in
correct
order:
b<-getBM(c("sequence_gene_stable_id","sequence_str_chrom_name",
"sequence_biotype","sequence_exon_chrom_start","sequence_exon_chrom_en
d")
,filters="ensembl_gene_id",values="ENSG00000197530",mart=mart)
You'll get:
gene_stable_id str_chrom_name struct_biotype exon_chrom_start
exon_chrom_end
1 ENSG00000197530 1 protein_coding 1540747
1540876
2 ENSG00000197530 1 protein_coding 1541751
1541857
3 ENSG00000197530 1 protein_coding 1548632
1548942
4 ENSG00000197530 1 protein_coding 1549017
1549188
Cheers,
Steffen
>
>
> Begin forwarded message:
>
>> From: Mark Robinson <mrobinson at="" wehi.edu.au="">
>> Date: 5 July 2008 9:13:48 AM
>> To: bioconductor at stat.math.ethz.ch
>> Subject: [BioC] biomaRt column order
>>
>> Dear list.
>>
>> I'm using biomaRt to do a fairly simple query against the Ensembl
>> human database. I get returned a table with column names that
don't
>> match the data in the columns. See below.
>>
>> I can reshuffle them afterwards to make them, but thats not ideal.
>>
>> Am I doing something wrong?
>>
>> Thanks,
>> Mark
>>
>>
>>
>>
>> > library(biomaRt)
>> Loading required package: RCurl
>> > mart=useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
>> Checking attributes and filters ... ok
>> > mart
>> Object of class 'Mart':
>> Using the ensembl BioMart database
>> Using the hsapiens_gene_ensembl dataset
>> > b<-
>> getBM
>> (c
>> ("ensembl_gene_id
>> ","chromosome_name
>> ","sequence_biotype
>> ","sequence_exon_chrom_start
>> ","sequence_exon_chrom_end
>> "),filters="ensembl_gene_id",values="ENSG00000197530",mart=mart)
>> > dim(b)
>> [1] 25 5
>> > b[1:10,]
>> ensembl_gene_id chromosome_name struct_biotype exon_chrom_start
>> 1 protein_coding 1542803 1542958 ENSG00000197530
>> 2 protein_coding 1548674 1548942 ENSG00000197530
>> 3 protein_coding 1549017 1549188 ENSG00000197530
>> 4 protein_coding 1550038 1550144 ENSG00000197530
>> 5 protein_coding 1550234 1550428 ENSG00000197530
>> 6 protein_coding 1550529 1550671 ENSG00000197530
>> 7 protein_coding 1551893 1551997 ENSG00000197530
>> 8 protein_coding 1552080 1552242 ENSG00000197530
>> 9 protein_coding 1552317 1552450 ENSG00000197530
>> 10 protein_coding 1552539 1552687 ENSG00000197530
>> exon_chrom_end
>> 1 1
>> 2 1
>> 3 1
>> 4 1
>> 5 1
>> 6 1
>> 7 1
>> 8 1
>> 9 1
>> 10 1
>> > sessionInfo()
>> R version 2.7.1 (2008-06-23)
>> i386-apple-darwin8.10.1
>>
>> locale:
>> en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods
base
>>
>> other attached packages:
>> [1] biomaRt_1.14.0 RCurl_0.9-3
>>
>> loaded via a namespace (and not attached):
>> [1] XML_1.95-2
>>
>>
>>
>> ------------------------------
>> Mark Robinson
>> Epigenetics Laboratory, Garvan
>> Bioinformatics Division, WEHI
>> e: m.robinson at garvan.org.au
>> e: mrobinson at wehi.edu.au
>> p: +61 (0)3 9345 2628
>> f: +61 (0)3 9347 0852
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> ------------------------------
> Mark Robinson
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robinson at garvan.org.au
> e: mrobinson at wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
> ------------------------------
>
>
>
>