Search
Question: gage pathways enrichment analysis
0
gravatar for francesca.defilippis
3.0 years ago by
European Union
francesca.defilippis40 wrote:

Hello! I'd like to do a pathway enrichment analysis on metagenomic data. I already have the KO IDs for my genes. Does gage support bacteria genomes? than, I saw that you need to specify the organism name, but I'm not interested in a specific bacterial species. Is it possible to use it with KO ids from different species?

 

Thanks

Francesca

ADD COMMENTlink modified 3.0 years ago by Luo Weijun1.4k • written 3.0 years ago by francesca.defilippis40
0
gravatar for Luo Weijun
3.0 years ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:

Yes, both gage and pathview work with KO. You may generate the pathway gene set data using function kegg.gsets (with species="ko"). Then you can follow the examples in the quick start and basic analysis sections of the gage tutorial:

http://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/gage.pdf

 

you may also work with any individual species (including bacteria) as long as it is included in KEGG. For details:

library(pathview)

data(korg)

head(korg)

 

you can also check section 7.5 of the pathview tutorial:

http://bioconductor.org/packages/release/bioc/vignettes/pathview/inst/doc/pathview.pdf

ADD COMMENTlink written 3.0 years ago by Luo Weijun1.4k
0
gravatar for francesca.defilippis
3.0 years ago by
European Union
francesca.defilippis40 wrote:

Hi followed the tutorial starting from DESeq output.

> require(gage)

> datakegg.gs)

> fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
> sel <- fc.kegg.p$greater[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$greater[, "q.val"])
> path.ids <- rownames(fc.kegg.p$greater)[sel]
> sel.l <- fc.kegg.p$less[, "q.val"] < 0.1 &
+ !is.na(fc.kegg.p$less[,"q.val"])
> path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
> path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)

 

But how can I get a sort of list o up- or down-regulated pathways?

 

ADD COMMENTlink written 3.0 years ago by francesca.defilippis40
0
gravatar for Luo Weijun
3.0 years ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:

In your code above, path.ids and path.ids.l are the list of up and down-regulated pathways selected based on q-val<0.1. the full analysis results tables are stored in fc.kegg.p$greater and fc.kegg.p$less. To take a look:

head(fc.kegg.p$greater)

ADD COMMENTlink written 3.0 years ago by Luo Weijun1.4k

My fc.kegg.p table is all NAs... Moreover, I have all pathways from human genome (i think, the IDs start with hsa). In the code above, where should I specify the organism? And what di I need to use, since I have mixed bacteria population?

ADD REPLYlink written 3.0 years ago by francesca.defilippis40
0
gravatar for Luo Weijun
3.0 years ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:

When you call gage function in your code above, you specify gsets = kegg.gs, which is the gene set data. However, you used the human gene set data by following the tutorial example exactly. As I mentioned above, you should create your own gene set data using kegg.gsets function (with species="ko"). Please follow the suggested links/docs above.

ADD COMMENTlink written 3.0 years ago by Luo Weijun1.4k

I'm still having troubles. I created my gset

> myg.sets=kegg.gsets(species = "ko")

And then used it in the gage function

fc.kegg.p <- gage(exp.fc, gsets = myg.sets, ref = NULL, samp = NULL)

this is my exp.fc (fold changes from deseq)

head(exp.fc)
   K01866    K01740    K01892    K01583    K00147    K00290 
-5.155880 -5.086753 -3.897103 -3.808646 -3.787984 -3.112827 

 

But this still resulted in a NAs table

fc.kegg.p$greater
           p.geomean stat.mean p.val q.val set.size exp1
kg.sets           NA       NaN    NA    NA        0   NA
sigmet.idx        NA       NaN    NA    NA        0   NA
sig.idx           NA       NaN    NA    NA        0   NA
met.idx           NA       NaN    NA    NA        0   NA
dise.idx          NA       NaN    NA    NA        0   NA

 

Where I'm doing wrong?

ADD REPLYlink written 3.0 years ago by francesca.defilippis40
0
gravatar for Luo Weijun
3.0 years ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:

You can create your KO gene set like:

kg.ko=kegg.gsets(species = "ko")

kegg.ko.gs=kg.ko$kg.sets[kg.ko$sigmet.idx]

 

kegg.ko.gs but not kg.ko is the one you use directly, you can check:

lapplykegg.ko.gs[1:3], head, 4)

names(kg.ko)

 

all info is documented in page 6 of the gage tutorial and the function help info:

?kegg.gsets

ADD COMMENTlink written 3.0 years ago by Luo Weijun1.4k

I'm sorry to bother you again, but I still have a table of NAs, except for setsize where there are some values

                                        p.geomean stat.mean p.val q.val
ko00970 Aminoacyl-tRNA biosynthesis            NA       NaN    NA    NA
ko02010 ABC transporters                       NA       NaN    NA    NA
ko02020 Two-component system                   NA       NaN    NA    NA
ko02030 Bacterial chemotaxis                   NA       NaN    NA    NA
ko02040 Flagellar assembly                     NA       NaN    NA    NA
ko02060 Phosphotransferase system (PTS)        NA       NaN    NA    NA

Where I'm doing wrong?

 

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by francesca.defilippis40
0
gravatar for Luo Weijun
3.0 years ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:

Can I see what you data look like? You may do:

str(exp.fc)

head(exp.fc)

 

and post the output here.

ADD COMMENTlink written 3.0 years ago by Luo Weijun1.4k

Thesev are the output:

str(exp.fc)
 Named num [1:238] -5.16 -5.09 -3.9 -3.81 -3.79 ...
 - attr(*, "names")= chr [1:238] "K01866" "K01740" "K01892" "K01583" ...

 

 head(exp.fc)
   K01866    K01740    K01892    K01583    K00147    K00290 
-5.155880 -5.086753 -3.897103 -3.808646 -3.787984 -3.112827 

 

ADD REPLYlink written 3.0 years ago by francesca.defilippis40
0
gravatar for Luo Weijun
3.0 years ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:

Okay, you only have 238 KO genes. These seems to be a selected significant gene list, since all are heavily up or down regulated based on log2 fold change values.

Gene set analysis like GAGE requires the full list of genes (proteins, molecules etc), usually thousands of them, instead of a preselected short list. Otherwise, there is no background to compare to in the statistical test. Therefore, you should include data for all KO genes output (from DEseq or other analysis) as described in gage tutorials.

BTW, you saw NA or NaN in your output because each pathways gets none or only a few genes mapped as the list is very short with 238 genes.

ADD COMMENTlink written 3.0 years ago by Luo Weijun1.4k

Hi! I'm trying to apply the gage workflow to the full count matrix (raw counts, not the results from deseq). 

 

I normalized the counts as described in the documentation and created my kegg gsets as above. Then I tried:

counts.kegg.p <- gage(metag.normOV, gsets =kegg.ko.gs, ref = metag.normOV[,names(which(clD == 'O'))], samp = metag.normOV[,names(which(clD == 'V'))])

This is my normalized counts matrix

head(metag.normOV)

         15.TO.V   37.TO.O  02.BA.O   32.TO.O  35.TO.V  33.TO.O  25.TO.V  30.BO.O  10.BO.V   24.TO.O   27.TO.V  13.BA.V
K01361  3.000000  3.000000 3.000000  3.000000 3.000000 3.000000 3.000000 3.474514 3.000000  3.000000  3.000000 3.284227
K01362 10.510925 10.180883 9.508538 10.747639 9.783315 9.620501 9.132543 9.513370 9.918586 10.406776 10.114846 9.444985
K05841  3.000000  3.000000 3.000000  3.000000 3.000000 3.000000 3.000000 3.000000 3.000000  3.000000  3.000000 3.000000
K05844  3.000000  3.648295 3.410819  3.000000 3.134276 3.153423 3.000000 3.333004 3.000000  3.000000  3.000000 3.000000
K05845  3.473451  4.992189 4.573684  4.758411 3.664711 3.292087 3.247012 4.738421 5.297531  3.855768  3.131843 3.725325
K05846  7.614699  8.596064 7.457748  6.973650 7.457559 7.209767 7.663303 7.752926 7.837494  8.267429  7.530133 7.813055

 

and clD is just a class vector in order to select controls vs treated samples (O and V)

 

But I got this error

Error in saaPrep(exprs, ref = ref, samp = samp, same.dir = same.dir, compare = compare,  : 
  wrong reference column index

Any clue about where the problem could be?

 

ADD REPLYlink written 3.0 years ago by francesca.defilippis40
0
gravatar for Luo Weijun
3.0 years ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:

ref and samp should be column numbers like ref=1:3, samp=4:6 etc. obviously yours are not. You may check the function info:

?gage

In addition, names(which(clD == 'O')) will yeild NULL. Please get yourself familiar with basic R syntax.

 

You should do something like:

cns=colnames(metag.normOV)

ref1=grep(".O$", cns)

samp1=grep(".V$", cns)

counts.kegg.p = gage(metag.normOV, gsets =kegg.ko.gs, ref=ref1, samp=samp1)

 

ADD COMMENTlink written 3.0 years ago by Luo Weijun1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 318 users visited in the last hour