retrieve vgnc gene name or vgnc gene symbol from Biomart after differential analysis
1
0
Entering edit mode
cagenet34 ▴ 20
@cagenet34-10910
Last seen 2.6 years ago
Toulouse, France, INRA

I'm using Biomart package after differential expression analysis with DESeq2 package. I used ENSEMBL ID. After differential analysis I would like to obtain the HGNC symbol or VGNC symbol from ENSEMBL ID. The following function work for HGNC symbol but I failed to retrieve vgnc gene name.

add.anns <- function(df, mart, ...)
{
  nm <- rownames(df)
  anns <- getBM(
    attributes = c("ensembl_gene_id", "hgnc_symbol","vgnc_genename", "description"),
    filters = "ensembl_gene_id", values = nm, mart = mart)
  anns <- anns[match(nm, anns[, 1]), ]
  colnames(anns) <- c("ID", "Gene Symbol", "Gene Description")
  df <- cbind(anns, df[, 2:ncol(df)])
  rownames(df) <- nm
  df
  }

 

Can you  tell me if there is an attribute to obtain vgnc gene name because I failed to obtained it (didn't find it with listAttributes(mart) ) .

Thank you in advance.

By the way your url retrieves  "404 not found error".

Regards

Carine

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.36.1              org.Bt.eg.db_3.6.0          AnnotationDbi_1.42.1       
 [4] ReportingTools_2.20.0       bindrcpp_0.2.2              DESeq2_1.20.0              
 [7] dplyr_0.7.6                 factoextra_1.0.5            ggplot2_3.0.0              
[10] ade4_1.7-11                 knitr_1.20                  SummarizedExperiment_1.10.1
[13] DelayedArray_0.6.2          BiocParallel_1.14.2         matrixStats_0.54.0         
[16] Biobase_2.40.0              GenomicRanges_1.32.6        GenomeInfoDb_1.16.0        
[19] IRanges_2.14.10             S4Vectors_0.18.3            BiocGenerics_0.26.0        

loaded via a namespace (and not attached):
  [1] backports_1.1.2          GOstats_2.46.0           Hmisc_4.1-1             
  [4] plyr_1.8.4               lazyeval_0.2.1           GSEABase_1.42.0         
  [7] splines_3.5.1            digest_0.6.15            BiocInstaller_1.30.0    
 [10] ensembldb_2.4.1          htmltools_0.3.6          GO.db_3.6.0             
 [13] magrittr_1.5             checkmate_1.8.5          memoise_1.1.0           
 [16] BSgenome_1.48.0          cluster_2.0.7-1          limma_3.36.2            
 [19] Biostrings_2.48.0        annotate_1.58.0          R.utils_2.6.0           
 [22] ggbio_1.28.4             prettyunits_1.0.2        colorspace_1.3-2        
 [25] apeglm_1.2.0             ggrepel_0.8.0            blob_1.1.1              
 [28] crayon_1.3.4             RCurl_1.95-4.11          graph_1.58.0            
 [31] genefilter_1.62.0        bindr_0.1.1              VariantAnnotation_1.26.1
 [34] survival_2.42-3          glue_1.3.0               gtable_0.2.0            
 [37] zlibbioc_1.26.0          XVector_0.20.0           Rgraphviz_2.24.0        
 [40] scales_0.5.0             GGally_1.4.0             DBI_1.0.0               
 [43] edgeR_3.22.3             Rcpp_0.12.18             emdbook_1.3.10          
 [46] htmlTable_1.12           xtable_1.8-2             progress_1.2.0          
 [49] foreign_0.8-70           bit_1.1-14               OrganismDbi_1.22.0      
 [52] Formula_1.2-3            AnnotationForge_1.22.1   httr_1.3.1              
 [55] htmlwidgets_1.2          RColorBrewer_1.1-2       acepack_1.4.1           
 [58] R.methodsS3_1.7.1        reshape_0.8.7            pkgconfig_2.0.1         
 [61] XML_3.98-1.12            nnet_7.3-12              locfit_1.5-9.1          
 [64] reshape2_1.4.3           tidyselect_0.2.4         rlang_0.2.1             
 [67] munsell_0.5.0            tools_3.5.1              RSQLite_2.1.1           
 [70] stringr_1.3.1            yaml_2.2.0               bit64_0.9-7             
 [73] purrr_0.2.5              AnnotationFilter_1.4.0   RBGL_1.56.0             
 [76] R.oo_1.22.0              compiler_3.5.1           rstudioapi_0.7          
 [79] curl_3.2                 PFAM.db_3.6.0            geneplotter_1.58.0      
 [82] tibble_1.4.2             stringi_1.1.7            GenomicFeatures_1.32.0  
 [85] lattice_0.20-35          ProtGenerics_1.12.0      Matrix_1.2-14           
 [88] pillar_1.3.0             data.table_1.11.4        bitops_1.0-6            
 [91] rtracklayer_1.40.3       hwriter_1.3.2            R6_2.2.2                
 [94] latticeExtra_0.6-28      gridExtra_2.3            dichromat_2.0-0         
 [97] MASS_7.3-50              assertthat_0.2.0         Category_2.46.0         
[100] withr_2.1.2              GenomicAlignments_1.16.0 Rsamtools_1.32.2        
[103] GenomeInfoDbData_1.1.0   hms_0.4.2                grid_3.5.1              
[106] rpart_4.1-13             coda_0.19-1              biovizBase_1.28.1       
[109] bbmle_1.0.20             numDeriv_2016.8-1        base64enc_0.1-3      
biomart mart getbm • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 hour ago
United States

You don't say what species you are querying, but do note that there isn't a VGNC name or ID for human, because VGNC is for non-human species. If I try say, bovine, I have no problems:

> library(biomaRt)
> mart <- useMart("ensembl","btaurus_gene_ensembl")

> ids <- head(keys(org.Bt.eg.db, "ENSEMBL"))
> ids
[1] "ENSBTAG00000022255" "ENSBTAG00000017280" "ENSBTAG00000037533"
[4] "ENSBTAG00000004371" "ENSBTAG00000007148" "ENSBTAG00000007101"

> grep("gnc", listAttributes(mart)[,1], value = TRUE)
[1] "hgnc_id"         "hgnc_symbol"     "hgnc_trans_name" "vgnc"           
[5] "vgnc_trans_name"
> getBM(c("ensembl_gene_id","vgnc","vgnc_trans_name"), "ensembl_gene_id", ids, mart)
     ensembl_gene_id       vgnc vgnc_trans_name
1 ENSBTAG00000004371 VGNC:27748          CS-201
2 ENSBTAG00000007101 VGNC:28687          F3-201
3 ENSBTAG00000007148 VGNC:28682          F2-201
4 ENSBTAG00000017280 VGNC:26638          C3-201
5 ENSBTAG00000022255 VGNC:26053          AR-201
6 ENSBTAG00000037533                          

Also, what do you mean by

By the way your url retrieves  "404 not found error".?

It's not clear what you mean by 'your url'.

ADD COMMENT
0
Entering edit mode

Hello James

Thank you for your reply. I would like the vgnc for bovine. Thank you for your example from which I deduced that there is still no VGNC name attribute in Biomart.  ... Am I right.

For the url, It was because in a first attempt I was responding to news "Ensembl 93 is out" from Amonida "Ensembl 93 is out! for which the url is not correct. Then I realized that I was wrong and simply copy past my comment.

Sorry

Regards

 

 

ADD REPLY
0
Entering edit mode

Why do you think the vgnc_trans_name isn't the VGNC name?

> ids <- head(keys(org.Bt.eg.db, "ENSEMBL"), 20)

> library(biomaRt)
> mart <- useMart("ensembl","btaurus_gene_ensembl")
> z <- getBM(c("ensembl_gene_id","vgnc_trans_name"), "ensembl_gene_id",ids, mart)
> z[match(ids, z$ensembl_gene_id),]
      ensembl_gene_id vgnc_trans_name
20 ENSBTAG00000022255          AR-201
13 ENSBTAG00000017280          C3-201
22 ENSBTAG00000037533                
3  ENSBTAG00000004371          CS-201
8  ENSBTAG00000007148          F2-201
7  ENSBTAG00000007101          F3-201
14 ENSBTAG00000017722          F5-201
1  ENSBTAG00000004003          F9-201
23 ENSBTAG00000039812                
6  ENSBTAG00000006354                
5  ENSBTAG00000005333          MB-201
15 ENSBTAG00000018843    SERPINA1-201
11 ENSBTAG00000007836                
9  ENSBTAG00000007273                
10 ENSBTAG00000007823          TG-201
21 ENSBTAG00000026768          TH-201
18 ENSBTAG00000021364        TNK2-201
12 ENSBTAG00000010182         ACR-201
4  ENSBTAG00000005280         ADA-201
16 ENSBTAG00000021048         ADM-201
> select(org.Bt.eg.db, ids, "SYMBOL","ENSEMBL")
'select()' returned 1:many mapping between keys and columns
              ENSEMBL    SYMBOL
1  ENSBTAG00000022255        AR
2  ENSBTAG00000017280        C3
3  ENSBTAG00000037533       C4A
4  ENSBTAG00000037533 LOC617696
5  ENSBTAG00000004371        CS
6  ENSBTAG00000007148        F2
7  ENSBTAG00000007101        F3
8  ENSBTAG00000017722        F5
9  ENSBTAG00000004003        F9
10 ENSBTAG00000039812        H4
11 ENSBTAG00000039812 LOC526789
12 ENSBTAG00000039812 LOC527645
13 ENSBTAG00000006354        HP
14 ENSBTAG00000005333        MB
15 ENSBTAG00000018843  SERPINA1
16 ENSBTAG00000007836      PPA1
17 ENSBTAG00000007273        TF
18 ENSBTAG00000007823        TG
19 ENSBTAG00000026768        TH
20 ENSBTAG00000021364      TNK2
21 ENSBTAG00000010182       ACR
22 ENSBTAG00000005280       ADA
23 ENSBTAG00000021048       ADM

Other than the trailing -201, those look the same to me.

ADD REPLY
0
Entering edit mode

Oups..... ok that means that I need two coffee cups or need holidays....sorry again and thank you so much for your help !!
 

ADD REPLY

Login before adding your answer.

Traffic: 692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6