Biomart expection using values filter
1
0
Entering edit mode
@jarod_v6liberoit-6654
Last seen 5.2 years ago
Italy
I use this code  and seem  I ca'nt filter only protein coding genes. If I use values= ret$ensembl the operation works well. How can resolve this problem

ensembl = useMart( host="dec2017.archive.ensembl.org", biomart="ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl" )

genemap <- getBM( attributes = c("ensembl_gene_id", "hgnc_symbol","family","gene_biotype","family_description"),
                  filters = "ensembl_gene_id",
                  values = list(ret$ensembl,"protein_coding"),
                  mart = ensembl )

 

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=it_IT.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ComplexHeatmap_1.17.1      gplots_3.0.1               biomaRt_2.34.2             limma_3.34.9              
 [5] pheatmap_1.0.8             genefilter_1.60.0          reshape2_1.4.3             RColorBrewer_1.1-2        
 [9] ggplot2_2.2.1              DESeq2_1.18.1              SummarizedExperiment_1.8.1 DelayedArray_0.4.1        
[13] matrixStats_0.53.1         Biobase_2.38.0             GenomicRanges_1.30.3       GenomeInfoDb_1.14.0       
[17] IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] httr_1.3.1             bit64_0.9-7            splines_3.4.4          gtools_3.5.0           Formula_1.2-3         
 [6] assertthat_0.2.0       latticeExtra_0.6-28    blob_1.1.1             GenomeInfoDbData_1.0.0 progress_1.1.2        
[11] pillar_1.2.2           RSQLite_2.1.1          backports_1.1.2        lattice_0.20-35        digest_0.6.15         
[16] XVector_0.18.0         checkmate_1.8.5        colorspace_1.3-2       htmltools_0.3.6        Matrix_1.2-14         
[21] plyr_1.8.4             XML_3.98-1.11          GetoptLong_0.1.6       zlibbioc_1.24.0        xtable_1.8-2          
[26] scales_0.5.0           gdata_2.18.0           BiocParallel_1.12.0    htmlTable_1.11.2       tibble_1.4.2          
[31] annotate_1.56.2        nnet_7.3-12            lazyeval_0.2.1         survival_2.42-3        magrittr_1.5          
[36] memoise_1.1.0          foreign_0.8-70         tools_3.4.4            data.table_1.11.2      prettyunits_1.0.2     
[41] GlobalOptions_0.0.13   stringr_1.3.1          munsell_0.4.3          locfit_1.5-9.1         cluster_2.0.7-1       
[46] AnnotationDbi_1.40.0   compiler_3.4.4         caTools_1.17.1         rlang_0.2.0            RCurl_1.95-4.10       
[51] rstudioapi_0.7         circlize_0.4.3         rjson_0.2.18           htmlwidgets_1.2        bitops_1.0-6          
[56] base64enc_0.1-3        gtable_0.2.0           curl_3.2               DBI_1.0.0              R6_2.2.2              
[61] gridExtra_2.3          knitr_1.20             bit_1.1-12             Hmisc_4.1-1            shape_1.4.4           
[66] KernSmooth_2.23-15     stringi_1.2.2          Rcpp_0.12.16           geneplotter_1.56.0     rpart_4.1-13          
[71] acepack_1.4.1
biomart • 707 views
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 44 minutes ago
EMBL Heidelberg

filters defines the fields in the dataset that you want to search, and values are the entries that you're looking for. As such the length of the vector of passed to filters needs to be the same as the length of the list provided to values. This isn't the case in your example, where you have only one filter but two sets of values you want to search for.

With your current query setup, there are two approaches you can take, first we'll use four Ensembl gene IDs since you don't provide any in your example

ensembl_ids <- c('ENSG00000228253',
                 'ENSG00000276565', 
                 'ENSG00000274847',
                 'ENSG00000283029') ## this non-coding 
  •  You're already returning the gene_biotype field in your results, so you can ignore trying to get Ensembl to return only protein coding genes, and simply do this as a second step in R.
genemap <- getBM( attributes = c("ensembl_gene_id", 
                                 "hgnc_symbol",
                                 "family",
                                 "gene_biotype",
                                 "family_description"),
                  filters = "ensembl_gene_id",
                  values = ensembl_ids,
                  mart = ensembl )

## here we filter using the biotype column
genemap[ genemap$gene_biotype == 'protein_coding', ]
  ensembl_gene_id hgnc_symbol    family   gene_biotype family_description
1 ENSG00000228253     MT-ATP8 PTHR13722 protein_coding     ATP SYNTHASE 8
2 ENSG00000274847       MAFIP PTHR19960 protein_coding             TEKTIN
3 ENSG00000276565              TF352204 protein_coding          AMBIGUOUS
  • The alternative is to give two filters to your biomaRt query, both the gene IDs and the biotype
genemap2 <- getBM( attributes = c("ensembl_gene_id", 
                                 "hgnc_symbol",
                                 "family",
                                 "gene_biotype",
                                 "family_description"),
                  filters = c("ensembl_gene_id", "biotype"),
                  values = list(ensembl_ids, "protein_coding"),
                  mart = ensembl )
genemap2
  ensembl_gene_id hgnc_symbol    family   gene_biotype family_description
1 ENSG00000228253     MT-ATP8 PTHR13722 protein_coding     ATP SYNTHASE 8
2 ENSG00000274847       MAFIP PTHR19960 protein_coding             TEKTIN
3 ENSG00000276565              TF352204 protein_coding          AMBIGUOUS
ADD COMMENT

Login before adding your answer.

Traffic: 891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6