pathview returns incorrect mappings for yeast
1
0
Entering edit mode
@4db1a7b6
Last seen 5 weeks ago
United States

I am having trouble getting pathview to map refseq systematic IDs (locus_tag) to the correct gene symbol. For example, pathview maps YLR174W to IDP1 when it should be IDP2, and YOL126C to MDH3 when it should be MDH2. Any suggestions?

> x[[1]]$plot.data.gene %>% filter(kegg.names %in% c('YLR174W', 'YOL126C'))
   kegg.names labels all.mapped type   x   y width height log2FoldChange mol.col
39    YLR174W   IDP1    YLR174W gene 718 510    46     17       1.161110 #FF0000
41    YLR174W   IDP1    YLR174W gene 718 405    46     17       1.161110 #FF0000
47    YOL126C   MDH3    YOL126C gene 253 349    46     17       1.898154 #FF0000

This is how I am calling pathview


mapKEGGpathway = function(name, res, pathway_id, lfc_thres, padj_thres = .05, species = 'sce'){

  fltr_res = res %>%
    as.data.frame() %>%
    filter(abs(log2FoldChange) > lfc_thres &
           padj < padj_thres) %>%
    select(log2FoldChange)


  pathview(
    gene.data = fltr_res,
    gene.idtype = 'kegg', # per the documentation
    kegg.native = FALSE,
    map.symbol = TRUE,
    expand.node = TRUE,
    pathway.id = pathway_id,
    species = species,
    out.suffix = paste0(name, "_", names(pathways[pathways == pathway_id]))
  )

}

x = map(names(shrunken_res_lists$minus_lys),
  ~mapKEGGpathway(., shrunken_res_lists$minus_lys[[.]], 
     pathway_id = pathways$tca_cycle, lfc_thres = 1))

shrunken_res_list$minus_lys is a list of DESeq2 results tables that look like this:

> head(shrunken_res_lists$minus_lys$EDS1)
log2 fold change (MMSE): aminoAcid_HisMetLeuUra_vs_LysHisMetLeuUra vs genotypeEDS1.aminoAcidHisMetLeuUra 
Wald test p-value: aminoAcid_HisMetLeuUra_vs_LysHisMetLeuUra vs genotypeEDS1.aminoAcidHisMetLeuUra 
DataFrame with 6 rows and 5 columns
       baseMean log2FoldChange     lfcSE      pvalue       padj
      <numeric>      <numeric> <numeric>   <numeric>  <numeric>
Q0020   2717.37      -2.132892  1.267952 0.000346399 0.00453428
Q0045   5007.13      -0.335857  0.534820 0.158333978 0.37525617
Q0050   1229.19      -1.111804  0.877374 0.005074406 0.03406794
Q0055   2514.75      -1.081450  0.843239 0.005293675 0.03513808
Q0060    435.58      -0.987048  0.835864 0.007845219 0.04631870
Q0065   1044.41      -0.709558  0.750734 0.024276562 0.10358936
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pathview_1.34.0             ggVennDiagram_1.2.0         here_1.0.1                  pheatmap_1.0.12             DT_0.20                     forcats_0.5.1              
 [7] stringr_1.4.0               dplyr_1.0.7                 purrr_0.3.4                 readr_2.1.1                 tidyr_1.1.4                 tibble_3.1.6               
[13] ggplot2_3.3.5               tidyverse_1.3.1             patchwork_1.1.1             gprofiler2_0.2.1            DESeq2_1.34.0               SummarizedExperiment_1.24.0
[19] Biobase_2.54.0              MatrixGenerics_1.6.0        matrixStats_0.61.0          GenomicRanges_1.46.1        GenomeInfoDb_1.30.0         IRanges_2.28.0             
[25] S4Vectors_0.32.3            BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
  [1] colorspace_2.0-2         rjson_0.2.21             class_7.3-20             ellipsis_0.3.2           rprojroot_2.0.2          XVector_0.34.0          
  [7] fs_1.5.2                 proxy_0.4-26             rstudioapi_0.13          farver_2.1.0             bit64_4.0.5              AnnotationDbi_1.56.2    
 [13] fansi_1.0.2              lubridate_1.8.0          xml2_1.3.3               splines_4.1.2            cachem_1.0.6             geneplotter_1.72.0      
 [19] knitr_1.37               jsonlite_1.7.3           Rsamtools_2.10.0         broom_0.7.11             annotate_1.72.0          dbplyr_2.1.1            
 [25] png_0.1-7                graph_1.72.0             compiler_4.1.2           httr_1.4.2               backports_1.4.1          assertthat_0.2.1        
 [31] Matrix_1.4-0             fastmap_1.1.0            lazyeval_0.2.2           cli_3.1.1                htmltools_0.5.2          tools_4.1.2             
 [37] gtable_0.3.0             glue_1.6.1               GenomeInfoDbData_1.2.7   Rcpp_1.0.8               cellranger_1.1.0         vctrs_0.3.8             
 [43] Biostrings_2.62.0        rtracklayer_1.54.0       xfun_0.29                rvest_1.0.2              lifecycle_1.0.1          restfulr_0.0.13         
 [49] XML_3.99-0.8             org.Hs.eg.db_3.14.0      zlibbioc_1.40.0          scales_1.1.1             org.Sc.sgd.db_3.14.0     hms_1.1.1               
 [55] KEGGgraph_1.54.0         parallel_4.1.2           RColorBrewer_1.1-2       yaml_2.2.1               memoise_2.0.1            stringi_1.7.6           
 [61] RSQLite_2.2.9            genefilter_1.76.0        BiocIO_1.4.0             e1071_1.7-9              BiocParallel_1.28.3      rlang_0.4.12            
 [67] pkgconfig_2.0.3          bitops_1.0-7             evaluate_0.14            lattice_0.20-45          sf_1.0-5                 labeling_0.4.2          
 [73] GenomicAlignments_1.30.0 htmlwidgets_1.5.4        bit_4.0.4                tidyselect_1.1.1         magrittr_2.0.1           R6_2.5.1                
 [79] generics_0.1.1           DelayedArray_0.20.0      DBI_1.1.2                pillar_1.6.4             haven_2.4.3              withr_2.4.3             
 [85] units_0.7-2              survival_3.2-13          KEGGREST_1.34.0          RCurl_1.98-1.5           modelr_0.1.8             crayon_1.4.2            
 [91] KernSmooth_2.23-20       utf8_1.2.2               plotly_4.10.0            RVenn_1.1.0              tzdb_0.2.0               rmarkdown_2.11          
 [97] locfit_1.5-9.4           grid_4.1.2               readxl_1.3.1             data.table_1.14.2        Rgraphviz_2.38.0         blob_1.2.2              
[103] classInt_0.4-3           reprex_2.0.1             digest_0.6.29            xtable_1.8-4             munsell_0.5.0            viridisLite_0.4.0
org.Sc.sgd.db org.Sc pathview • 956 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States

Seems like a problem with the kgml file for that pathway. If you download that file, you will see this:

    <entry id="39" name="sce:YDL066W sce:YLR174W sce:YNL009W" type="gene" reaction="rn:R00268"
        link="https://www.kegg.jp/dbget-bin/www_bget?sce:YDL066W+sce:YLR174W+sce:YNL009W">
        <graphics name="IDP1..." fgcolor="#000000"

Where YLD066W is IDP1, YLR174W is IDP2 and YNL009W is IDP3. pathview parses the KGML file for the gene symbols and since there is only one symbol provided, you get mismatches.

ADD COMMENT
0
Entering edit mode

For the sake of completeness, that was the tca cycle pathway sce00020. Thank you for checking that -- I did write to kegg for what it is worth.

ADD REPLY

Login before adding your answer.

Traffic: 617 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6