Hi Pieta,
I am happy to hear that. BioMart central portal at www.biomart.org is
the default server used by biomaRt and holds biomart databases from
different projects. The Ensembl website www.ensembl.org) holds the
Ensembl mart databases that we generate in the Ensembl project.
When you pointing your host to:
>> mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL",
host="www.ensembl.org", path="/biomart/martservice")
You use the www.ensembl.org server instead of the biomart.org server.
Hope this helps,
Thomas
On 7 Feb 2014, at 16:35, Pieta Schofield <p.schofield@dundee.ac.uk>
wrote:
> Hullo Thomas
>
> That seems to have worked, no missing genes
>
> Loading required package: biomaRt
> Loading required package: methods
> [1] "Missing genes on chr 1 0 of individually 5363 all 5363"
> [2] "Missing genes on chr 2 0 of individually 4047 all 4047"
> [3] "Missing genes on chr 3 0 of individually 3101 all 3101"
> [4] "Missing genes on chr 4 0 of individually 2563 all 2563"
> [5] "Missing genes on chr 5 0 of individually 2859 all 2859"
> [6] "Missing genes on chr 6 0 of individually 2905 all 2905"
> [7] "Missing genes on chr 7 0 of individually 2876 all 2876"
> [8] "Missing genes on chr 8 0 of individually 2386 all 2386"
> [9] "Missing genes on chr 9 0 of individually 2323 all 2323"
> [10] "Missing genes on chr 10 0 of individually 2260 all 2260"
> [11] "Missing genes on chr 11 0 of individually 3208 all 3208"
> [12] "Missing genes on chr 12 0 of individually 2818 all 2818"
> [13] "Missing genes on chr 13 0 of individually 1217 all 1217"
> [14] "Missing genes on chr 14 0 of individually 2244 all 2244"
> [15] "Missing genes on chr 15 0 of individually 2080 all 2080"
> [16] "Missing genes on chr 16 0 of individually 2343 all 2343"
> [17] "Missing genes on chr 17 0 of individually 2903 all 2903"
> [18] "Missing genes on chr 18 0 of individually 1127 all 1127"
> [19] "Missing genes on chr 19 0 of individually 2910 all 2910"
> [20] "Missing genes on chr 20 0 of individually 1317 all 1317"
> [21] "Missing genes on chr 21 0 of individually 736 all 736"
> [22] "Missing genes on chr 22 0 of individually 1263 all 1263"
> [23] "Missing genes on chr X 0 of individually 2392 all 2392"
> [24] "Missing genes on chr Y 0 of individually 495 all 495"
> [25] "Missing genes on chr MT 0 of individually 37 all 37"
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] methods stats graphics grDevices utils datasets base
>
> other attached packages:
> [1] biomaRt_2.18.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.95-4.1 XML_3.95-0.2
> pieta-mba>
>
> Thanks what is the difference?
>
> Pieta
>
> On 7 Feb 14, at 16:18, Thomas Maurel <maurel@ebi.ac.uk> wrote:
>
>> Hi Pieta,
>>
>> I've just noticed that you are querying against biomart.org, could
you please try the same test but with your host pointing to
ensembl.org:
>>
>> mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL",
host="www.ensembl.org", path="/biomart/martservice")
>>
>> Please let me know if this changes anything.
>> Cheers,
>> Thomas
>> On 7 Feb 2014, at 10:49, Pieta Schofield <p.schofield@dundee.ac.uk>
wrote:
>>
>>> Hullo Thomas
>>>
>>> Thank you for the reply, the problem was not simply truncating
when not using any filter, it was when the filter list included more
than around 5 terms, but yes I have no problem when I do the
chromosomes individually and seeing I have access to parallel cores
and it makes sense to do downstream processing in parallel I have
amended my code so each core now just pulls the chromosome annotations
it is going to work on.
>>>
>>> thanks again
>>>
>>> Pieta
>>> On 7 Feb 14, at 10:16, Thomas Maurel <maurel@ebi.ac.uk> wrote:
>>>
>>>> Dear Pieta,
>>>>
>>>> As you have noticed in your queries, there is a danger of getting
truncated results with biomart when querying a big organisms such as
human without any filters. To be sure that you get all the genes back
I would advise you to filter on each chromosomes individually.
>>>>
>>>> Hope this helps,
>>>> Thomas
>>>> On 4 Feb 2014, at 17:45, "Pieta Schofield [guest]"
<guest@bioconductor.org> wrote:
>>>>
>>>>>
>>>>> Hullo
>>>>>
>>>>> I am getting some inconsistent results with my attempts to
filter genes by chromosome when retrieving them form
"hsapiens_gene_ensembl"
>>>>>
>>>>> if I specify a list of chromosomes to retrieve from I get a
different number for some of the chromosomes than if i retrieve all
the genes for all chromosomes or if I retrieve the genes from the
chromosomes individually.
>>>>>
>>>>> this is my full code of the problem
>>>>>
>>>>> thanks in advance
>>>>>
>>>>> #!/usr/bin/Rscript --vanilla
>>>>> require(biomaRt)
>>>>>
>>>>> .getGenes <- function(chrs=list())
>>>>> { # use biomart to get ranges for feature
>>>>> hg19 <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
>>>>> if(length(chrs)>0)
>>>>> {
>>>>> getBM(attributes=c("chromosome_name",
>>>>> "start_position",
>>>>> "end_position",
>>>>> "strand",
>>>>> "ensembl_gene_id",
>>>>> "external_gene_id",
>>>>> "gene_biotype"),
>>>>> filter="chromosome_name",
>>>>> values=chrs,
>>>>> mart=hg19)
>>>>> }else{
>>>>> getBM(attributes=c("chromosome_name",
>>>>> "start_position",
>>>>> "end_position",
>>>>> "strand",
>>>>> "ensembl_gene_id",
>>>>> "external_gene_id",
>>>>> "gene_biotype"),
>>>>> mart=hg19)
>>>>> }
>>>>> }
>>>>>
>>>>> unfiltered<-.getGenes()
>>>>>
>>>>> chrs<-c(unlist(seq(1,22)),"X","Y","MT")
>>>>> filtered<-.getGenes(chrs)
>>>>>
>>>>> chronly<-lapply(chrs,
>>>>> function(x){
>>>>> length(.getGenes(x)$ensembl_gene_id)
>>>>> })
>>>>> names(chronly)<-chrs
>>>>>
>>>>> allgenes<-table(unfiltered$chromosome_name)
>>>>> missinggenes<-table(unfiltered$chromosome_name[which(
>>>>> !(unfiltered$ensembl_gene_id
>>>>> %in%
>>>>> filtered$ensembl_gene_id))])
>>>>>
>>>>> unlist(lapply(chrs,
>>>>> function(x){
>>>>> totalmissing<-ifelseis.na(missinggenes[x]),0,missinggenes[x])
>>>>> paste("Missing genes on chr",x,totalmissing,"of",
>>>>> "individually",chronly[[x]],"all",allgenes[x])
>>>>> }))
>>>>>
>>>>> sessionInfo()
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- output of sessionInfo():
>>>>>
>>>>> Loading required package: biomaRt
>>>>> Loading required package: methods
>>>>> [1] "Missing genes on chr 1 0 of individually 5321 all 5321"
>>>>> [2] "Missing genes on chr 2 0 of individually 3990 all 3990"
>>>>> [3] "Missing genes on chr 3 0 of individually 3043 all 3043"
>>>>> [4] "Missing genes on chr 4 0 of individually 2521 all 2521"
>>>>> [5] "Missing genes on chr 5 0 of individually 2856 all 2856"
>>>>> [6] "Missing genes on chr 6 0 of individually 2906 all 2906"
>>>>> [7] "Missing genes on chr 7 0 of individually 2818 all 2818"
>>>>> [8] "Missing genes on chr 8 607 of individually 2385 all 2385"
>>>>> [9] "Missing genes on chr 9 1892 of individually 2303 all 2303"
>>>>> [10] "Missing genes on chr 10 0 of individually 2216 all 2216"
>>>>> [11] "Missing genes on chr 11 0 of individually 3190 all 3190"
>>>>> [12] "Missing genes on chr 12 0 of individually 2819 all 2819"
>>>>> [13] "Missing genes on chr 13 0 of individually 1217 all 1217"
>>>>> [14] "Missing genes on chr 14 0 of individually 2237 all 2237"
>>>>> [15] "Missing genes on chr 15 0 of individually 2076 all 2076"
>>>>> [16] "Missing genes on chr 16 0 of individually 2360 all 2360"
>>>>> [17] "Missing genes on chr 17 0 of individually 2901 all 2901"
>>>>> [18] "Missing genes on chr 18 0 of individually 1113 all 1113"
>>>>> [19] "Missing genes on chr 19 0 of individually 2917 all 2917"
>>>>> [20] "Missing genes on chr 20 0 of individually 1322 all 1322"
>>>>> [21] "Missing genes on chr 21 0 of individually 720 all 720"
>>>>> [22] "Missing genes on chr 22 0 of individually 1208 all 1208"
>>>>> [23] "Missing genes on chr X 2028 of individually 2414 all 2414"
>>>>> [24] "Missing genes on chr Y 506 of individually 506 all 506"
>>>>> [25] "Missing genes on chr MT 37 of individually 37 all 37"
>>>>> R version 3.0.2 (2013-09-25)
>>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1]
en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>>>>>
>>>>> attached base packages:
>>>>> [1] methods stats graphics grDevices utils datasets
base
>>>>>
>>>>> other attached packages:
>>>>> [1] biomaRt_2.18.0
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] RCurl_1.95-4.1 XML_3.95-0.2
>>>>>
>>>>> --
>>>>> Sent via the guest posting facility at bioconductor.org.
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor@r-project.org
>>>>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>> --
>>>> Thomas Maurel
>>>> Bioinformatician - Ensembl Production Team
>>>> European Bioinformatics Institute (EMBL-EBI)
>>>> European Molecular Biology Laboratory
>>>> Wellcome Trust Genome Campus
>>>> Hinxton
>>>> Cambridge CB10 1SD
>>>> United Kingdom
>>>>
>>>>
>>>> The University of Dundee is a registered Scottish Charity, No:
SC015096
>>>
>>>
>>> The University of Dundee is a registered Scottish Charity, No:
SC015096
>>
>> --
>> Thomas Maurel
>> Bioinformatician - Ensembl Production Team
>> European Bioinformatics Institute (EMBL-EBI)
>> European Molecular Biology Laboratory
>> Wellcome Trust Genome Campus
>> Hinxton
>> Cambridge CB10 1SD
>> United Kingdom
>>
>>
>> The University of Dundee is a registered Scottish Charity, No:
SC015096
>
>
> The University of Dundee is a registered Scottish Charity, No:
SC015096
--
Thomas Maurel
Bioinformatician - Ensembl Production Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom
[[alternative HTML version deleted]]