Output from pathview does not seem to match input
1
0
Entering edit mode
bdp • 0
@bdp-8406
Last seen 8.7 years ago
United States

Greetings Bioconductor community, 

This is my first time ever posting and I apologize ahead of time if I don't get the code embedded/formatted correctly.  

I have a question regarding the output from the pathview.  I'm a relatively new to pathway analysis and visualization, but from what I understand, the pathview function maps the gene, or metabolite, data that is inputted into the gene.data argument.  However, I noticed on my data that there are several genes in the pathview output that do not match the input data.  I've used an adapted workflow described in the vignette (Code Chunk 37) to duplicate my workflow and highlight the mismatched output. 

## Adapted from pathview vignette Code Chunk #37
library(gage)
data(gse16873)
cn <- colnames(gse16873)
hn <- grep('HN',cn, ignore.case =TRUE)
dcis <- grep('DCIS',cn, ignore.case =TRUE)
data(kegg.gs)
#prepare the differential expression data
## Vignette did not include compare argument.  Believe it defaulted to "paired"
gse16873.d <- gagePrep(gse16873, ref = hn, samp = dcis,compare="as.group")
library(pathview)
pvresults <- pathview(gene.data = gse16873.d, 
    pathway.id = "hsa00500", species = "hsa")
    
## Capturing gene mapping output    
pv_pdgres <- pvresults[["plot.data.gene"]]
## Selecting genes that were mapped in from gagePrep data 
selected_gageprep <- gse16873.d[which(rownames(gse16873.d) %in% unique(pv_pdgres$kegg.names)),]
## Viewing data from pathview and gagePrep output
pv_pdgres[,c("kegg.names","exp1")];selected_gageprep

> pv_pdgres[,c("kegg.names","exp1")];selected_gageprep
    kegg.names        exp1
86       57733 -0.03602881
87       57733 -0.03602881
88        3098  0.83039966
95        2548  0.32890221
97        6476  0.01425503
98       57733 -0.03602881
104      80146  0.16358451
111       5167  0.38402892
116     283209          NA
119       7358  0.93451774
120       7360  0.31232441
122       7363  0.62364563
123       2990  0.36034810
125       2997  0.06405324
127       5236 -0.07510014
129       2538  0.14398629
130       3098  0.83039966
131       2645 -0.02486297
132       2821  0.54499094
133        178  0.16979765
134       2632  0.21139016
137      11181 -0.01405438
144        276          NA
145       8972  0.16845517
146       2548  0.32890221
152       5834  0.20479492
153        178  0.16979765
171       6476  0.01425503
172       8972  0.16845517
173       6476  0.01425503
        11181           178                      2538          2548 
-0.0140543792  0.1697976464 -0.0002384661  0.1604470369 
         2632          2645                       2821          2990 
 0.2113901626 -0.0248629701  0.5449909414  0.4510192780 
         2997          3098                      5167          5236 
 0.0707799289  0.1950114770  0.3840289168 -0.0751001356 
        57733          5834                     6476          7358 
-0.0360288070 -0.1675698656  0.0142550297  0.9345177389 
         7360          7363                    80146          8972 
 0.3123244073 -0.0050854818  0.1635845085  0.1684551748 

 

round(selected_gageprep, 6) %in% round(unique(pv_pdgres[-which(pv_pdgres$exp1 %in% NA),"exp1"]),6)

> round(selected_gageprep, 6) %in% round(unique(pv_pdgres[-which(pv_pdgres$exp1 %in% NA),"exp1"]),6)
 [1]  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE
[11]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE

So, gene numbers 2538,2548,2990,2997,3098,5834, and 7363 do not match the gagePrep input.  

Am I missing something?  I don't typically work with genomic data, so I readily admit that I may be interpreting these results incorrectly.  However, I don't feel that there is anything in the help page that would suggest that the input will be modified.  

Greatly appreciate anyone's help!!!

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] pathview_1.8.0       org.Hs.eg.db_3.1.2   RSQLite_1.0.0       
 [4] DBI_0.3.1            AnnotationDbi_1.30.1 GenomeInfoDb_1.4.1  
 [7] IRanges_2.2.4        S4Vectors_0.6.0      Biobase_2.28.0      
[10] BiocGenerics_0.14.0  KEGGgraph_1.26.0     graph_1.46.0        
[13] XML_3.98-1.2         gage_2.18.0         

loaded via a namespace (and not attached):
 [1] Biostrings_2.36.1 png_0.1-7         grid_3.2.1       
 [4] magrittr_1.5      httr_0.6.1        zlibbioc_1.14.0  
 [7] stringi_0.4-1     XVector_0.8.0     Rgraphviz_2.12.0 
[10] tools_3.2.1       stringr_1.0.0     KEGGREST_1.8.0 

pathview gage pathway analysis • 1.7k views
ADD COMMENT
2
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 9 months ago
United States
In page 10 of pathview tutorial: Note in native KEGG view, a gene node may represent multiple genes/proteins with similar or redundant functional role. The number of member genes range from 1 up to several tens. They are intentionally put together as a single node on pathway graphs for better clarity and readability. Therefore, we do not split node and mark each member genes separately by default. But rather we visualize the node-wise data by summarize gene-wise data, users may specify the summarization method using node.sum arguement. in your pathview output data, each row is a KEGG node (rownames are node numbers). kegg.names column is the most representative gene ID for each node, the data columns are summarized data for each node, not each gene. Therefore, you see some difference.
ADD COMMENT
0
Entering edit mode

Luo,

Thank you for the help and clarification.  

ADD REPLY

Login before adding your answer.

Traffic: 785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6