Question: Biomart expection using values filter
0
gravatar for jarod_v6@libero.it
15 months ago by
Italy
jarod_v6@libero.it40 wrote:
I use this code  and seem  I ca'nt filter only protein coding genes. If I use values= ret$ensembl the operation works well. How can resolve this problem

ensembl = useMart( host="dec2017.archive.ensembl.org", biomart="ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" )

genemap <- getBM( attributes = c("ensembl_gene_id", "hgnc_symbol","family","gene_biotype","family_description"),
                  filters = "ensembl_gene_id",
                  values = list(ret$ensembl,"protein_coding"),
                  mart = ensembl )

 

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ComplexHeatmap_1.17.1      gplots_3.0.1               biomaRt_2.34.2             limma_3.34.9              
 [5] pheatmap_1.0.8             genefilter_1.60.0          reshape2_1.4.3             RColorBrewer_1.1-2        
 [9] ggplot2_2.2.1              DESeq2_1.18.1              SummarizedExperiment_1.8.1 DelayedArray_0.4.1        
[13] matrixStats_0.53.1         Biobase_2.38.0             GenomicRanges_1.30.3       GenomeInfoDb_1.14.0       
[17] IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] httr_1.3.1             bit64_0.9-7            splines_3.4.4          gtools_3.5.0           Formula_1.2-3         
 [6] assertthat_0.2.0       latticeExtra_0.6-28    blob_1.1.1             GenomeInfoDbData_1.0.0 progress_1.1.2        
[11] pillar_1.2.2           RSQLite_2.1.1          backports_1.1.2        lattice_0.20-35        digest_0.6.15         
[16] XVector_0.18.0         checkmate_1.8.5        colorspace_1.3-2       htmltools_0.3.6        Matrix_1.2-14         
[21] plyr_1.8.4             XML_3.98-1.11          GetoptLong_0.1.6       zlibbioc_1.24.0        xtable_1.8-2          
[26] scales_0.5.0           gdata_2.18.0           BiocParallel_1.12.0    htmlTable_1.11.2       tibble_1.4.2          
[31] annotate_1.56.2        nnet_7.3-12            lazyeval_0.2.1         survival_2.42-3        magrittr_1.5          
[36] memoise_1.1.0          foreign_0.8-70         tools_3.4.4            data.table_1.11.2      prettyunits_1.0.2     
[41] GlobalOptions_0.0.13   stringr_1.3.1          munsell_0.4.3          locfit_1.5-9.1         cluster_2.0.7-1       
[46] AnnotationDbi_1.40.0   compiler_3.4.4         caTools_1.17.1         rlang_0.2.0            RCurl_1.95-4.10       
[51] rstudioapi_0.7         circlize_0.4.3         rjson_0.2.18           htmlwidgets_1.2        bitops_1.0-6          
[56] base64enc_0.1-3        gtable_0.2.0           curl_3.2               DBI_1.0.0              R6_2.2.2              
[61] gridExtra_2.3          knitr_1.20             bit_1.1-12             Hmisc_4.1-1            shape_1.4.4           
[66] KernSmooth_2.23-15     stringi_1.2.2          Rcpp_0.12.16           geneplotter_1.56.0     rpart_4.1-13          
[71] acepack_1.4.1
biomart • 212 views
ADD COMMENTlink modified 15 months ago by Mike Smith3.9k • written 15 months ago by jarod_v6@libero.it40
Answer: Biomart expection using values filter
0
gravatar for Mike Smith
15 months ago by
Mike Smith3.9k
EMBL Heidelberg / de.NBI
Mike Smith3.9k wrote:

filters defines the fields in the dataset that you want to search, and values are the entries that you're looking for. As such the length of the vector of passed to filters needs to be the same as the length of the list provided to values. This isn't the case in your example, where you have only one filter but two sets of values you want to search for.

With your current query setup, there are two approaches you can take, first we'll use four Ensembl gene IDs since you don't provide any in your example

ensembl_ids <- c('ENSG00000228253',
                 'ENSG00000276565', 
                 'ENSG00000274847',
                 'ENSG00000283029') ## this non-coding 
  •  You're already returning the gene_biotype field in your results, so you can ignore trying to get Ensembl to return only protein coding genes, and simply do this as a second step in R.
genemap <- getBM( attributes = c("ensembl_gene_id", 
                                 "hgnc_symbol",
                                 "family",
                                 "gene_biotype",
                                 "family_description"),
                  filters = "ensembl_gene_id",
                  values = ensembl_ids,
                  mart = ensembl )

## here we filter using the biotype column
genemap[ genemap$gene_biotype == 'protein_coding', ]
  ensembl_gene_id hgnc_symbol    family   gene_biotype family_description
1 ENSG00000228253     MT-ATP8 PTHR13722 protein_coding     ATP SYNTHASE 8
2 ENSG00000274847       MAFIP PTHR19960 protein_coding             TEKTIN
3 ENSG00000276565              TF352204 protein_coding          AMBIGUOUS
  • The alternative is to give two filters to your biomaRt query, both the gene IDs and the biotype
genemap2 <- getBM( attributes = c("ensembl_gene_id", 
                                 "hgnc_symbol",
                                 "family",
                                 "gene_biotype",
                                 "family_description"),
                  filters = c("ensembl_gene_id", "biotype"),
                  values = list(ensembl_ids, "protein_coding"),
                  mart = ensembl )
genemap2
  ensembl_gene_id hgnc_symbol    family   gene_biotype family_description
1 ENSG00000228253     MT-ATP8 PTHR13722 protein_coding     ATP SYNTHASE 8
2 ENSG00000274847       MAFIP PTHR19960 protein_coding             TEKTIN
3 ENSG00000276565              TF352204 protein_coding          AMBIGUOUS
ADD COMMENTlink modified 15 months ago • written 15 months ago by Mike Smith3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 138 users visited in the last hour