Question: Pathview with non-KEGG organisms?
6.4 years ago by
Luo Weijun • 1.5k
Luo Weijun • 1.5k wrote:
Hi Iain, I am attaching the KO gene set data you need for your analysis. This data was generated the same way as gene set data in gage and gageData packages. For other users with similar needs, I will also provide this KO gene set data in the development version of pathview or gageData package soon. For your analysis, you may do something like: #load and check the ko gene set data load('/path/to/dir/ko.sets.RData') ls() lapply(ko.sets[1:3], head) #GAGE analysis library(gage) your.data.log2<-log2(your.data) gage.res <- gage(your.data.log2, ref= 1, samp=2, gsets = ko.sets[sigmet.idx]) Here I did log2 transformation on your data as is commonly in array and NGS data analysis, although data in original scale will also give you sensible results most likely. I did not use all KO pathways, but the signaling and metabolite pathways (hence excluded the disease pathways which may be less relevant for your analysis). Note that I only have 1 control and 1 experiment sample in above analysis. Once you get the significant pathway list, you may plug that with your.data.log2 into pathview for visualization. The KEGG gene sets in gage package are for human. I also provided KEGG and GO for several major research species in gageData package. you may want to take a look there if you need to work on other species. For viewing multiple experiments on KEGG pathway, I?ve been wondering whether I should provide that function in pathview. But I didn?t because it look quite messy no matter you divide each node into multiple pieces or do bar or line plots besides nodes. I may still provide that function in future release if I see enough interest in the user community. Currently, you may always generate one graph for each experiment condition. Weijun -------------------------------------------- On Tue, 7/30/13, Iain wrote: Subject: Re: Pathview with non-KEGG organisms? Date: Tuesday, July 30, 2013, 1:56 PM Hey Weijun, Thanks for the clarification on graphviz. I'm trying to run gage right now, but am running into a couple hurdles. I have my expression data in the proper format: LO2L1 LC1 LN1 K00005 1584.06859 1595.09485 1437.64499 K00012 143.25284 239.21267 237.12022 K00013 222.57466 227.87069 104.46555 K00014 40.25286 28.87049 34.82185 K00018 268.74706 182.50277 192.34927 K00020 113.65515 77.33168 94.51645 However, I'm a little stuck with the gsets. I downloaded a KEGG gset according to the instructions and got something like this: KEGG_GLYCOLYSIS_GLUCONEOGENESIS 55902 2645 5232 5230 5162 5160 5161 55276 7167 84532 2203 125 3099 126 3098 3101 127 5224 128 5223 124 230 501 92483 5313 160287 2023 5315 5214 669 5106 5105 219 217 218 10327 8789 5213 5211 3948 2597 2027 2026 441531 131 130 3945 220 221 222 223 224 130589 226 1738 1737 229 57818 3939 2538 5236 2821 ... but now it seems like I have to map these number ids to the KO ids in my expression set? Is it not possible to use a similar approach as before (species="ko") to have gage simply recognize the KOs? Otherwise, I'm not quite sure how to map KO ids to Entrez IDS because the id2eg function doesn't contain a KO id option. With regard to the node attributes that I'd like...I'm not really sure. I have multiple experimental conditions to represent on the pathway (treatment 1, treatment 2, treatment 3). I'd like to somehow visually compare expression under each treatment on the pathway. I suppose this could be represented by bar graphs next to the nodes, but this will be messy. Another idea would be to scale edges according to their expression values. So have three different color lines connecting nodes (one for each treatment) and scale line thickness according to expression. The problem with the current color scale that is it can only represent one value (a p-val, or a log fold change...). If I have 3 expression values (say for a gene, treatment 1 = 100, treatment 2 = 100, treatment 3 = 10000), I'm trying to think of way to compare these visually on the pathway. Your help and comments are much appreciated. Iain wrote: > Iain, > I agree, you will need gage or similar tools to pinpoint the significantly perturbed pathways first. The results can be easily piped into Pathview for automatic visualization. With your input data ready, you may finish the whole workflow in about 10 lines of code. Please check the ?Integrated workflow with pathway analysis? in page 15 of the Pathview vignette. > It is always a good idea to keep molecular (gene, compound) ID unique in your data as you?ve already done by summing over the KO ids. GAGE (or similar pathway/gene-set analysis tools) requires unique gene/molecule IDs for sensible enrichment tests. In addition, R may force your data IDs (names for vectors and rownames for matrix-like objects) to be unique by adding suffices to your duplicated IDs. > Graphviz view look quite different from KEGG view. Graphviz view layout the pathway topology automatically, users have little control over that. KEGG view uses the native KEGG pathway graph, which was designed and drawn fully by human. I am curious what types of node attributes you want to manipulate? What do you mean by ?plot actual data next to nodes?, by using discrete legends rather than color scale? > Weijun > > -------------------------------------------- > On Fri, 7/26/13, Iain wrote: > > Subject: Re: Pathview with non-KEGG organisms? > Date: Friday, July 26, 2013, 7:16 PM > > Hi Weijun, > > Thanks for your email. I ended up summing over my KO ids to > add > duplicates instead of using the mol.sum function (which I > think does > the same thing). I did this because I had instances where my > custom > ids had the same KO id. I got things working, but I actually > think I > should start with your gage package first to try to narrow > down what I > visualize. > > Another quick question - is it possible to have the graphviz > option > (kegg.native = F) maintain the general graph structure that > kegg.native = T displays? I would like to access the > functionality of > graphviz by being able to manipulate node attributes, while > still > keeping the canonical flow of a metabolic pathway. Also, is > it > possible to plot actual data next to nodes instead of using > the color > scale to represent values? > > Thanks again for your help, > Iain > > On Fri, Jul 26, 2013 at 10:48 AM, Luo Weijun > wrote: > > Hi Iain, > > Yes, pathview can work with your problem. First map > your genes to KEGG Orthology, and retrieve the KEGG ortholog > IDs (gene IDs in the format of Kxxxxx) (as you may have > done). Just label your genes use these KEGG ortholog IDs > (instead of Entrez Gene IDs or gene symbols). Then supply > your data as gene.data, and set species="ko" when calling > pathview function. Otherwise it would be the same as working > with KEGG species data. Please check the help info for > pathview function within R: > > ?pathview > > And look on the Arguments section (gene.data, species) > and Details section. > > > > Pathview also can be used directly to visualize > metagenomic or microbiome data when the data are mapped to > KEGG ortholog IDs. In fact, pathview can visualize various > types of molecular data as long as the data can be mapped > onto pathways. Pathview automatically maps common > gene/protein/compound IDs to KEGG molecular IDs for common > species. For less used IDs or other species, pathview will > also work if the user provides the ID mapping manually. > Please check page 13-14 in the package vignette for > pathview?s ID mapping functions and solutions. HTH. > > Weijun > > > > -------------------------------------------- > > On Fri, 7/26/13, Iain > wrote: > > > > Subject: Pathview with non-KEGG organisms? > > Date: Friday, July 26, 2013, 1:42 AM > > > > Hey Weijun, > > > > I've > > been > > looking for tools that allow RNA-seq data to be > overlaid on > > KEGG > > pathways. The problem is that the bacterium I > work on is not > > a KEGG > > organism. I have a draft genome and I have used > KASS to find > > KEGG > > Orthology assignments for each of the genes. Is > it possible, > > somehow, > > to still use Pathview? For example, instead of > calling the > > pathview > > function with species = "hsa", would it be > possible to > > provide a > > custom set of KO assignments? > > [[elided Yahoo spam]] > > > > Cheers, > > Iain > >
ADD COMMENT • link •