Question

GOseq package: Which genes are included in the "numDEInCat"?

0

Entering edit mode

Xavier ▴ 10

@xavier-12181

Last seen 7.3 years ago

Hi,

I've performed KEGG pathway analyses using the GOseq package and now I was wondering how do I know which genes were included in the "numDEInCat"? See below for example:

Category	Over_represented_pvalue	Under_represented_pvalue	numDEInCat	numInCat	pathway
01100	..	...	7	1179	...
00785	...	...	1	3	...
00450	...	...	1	17	...
00230	...	...	2	172	...

Thus in this example - output, when looking at category 01100, 7 DEGs were included in the pathway. How can I find the ensemble ID's of these genes? I can't find anything regarding this in the manual (http://bioconductor.org/packages/release/bioc/vignettes/goseq/inst/doc/goseq.pdf).

Anyone an idea?

Best,

Xavier

GOseq KEGG • 2.4k views

ADD COMMENT • link 7.3 years ago Xavier ▴ 10

score 1 · Answer 1 · 2017-01-16

You can try the following:

kegg <- lapply(en2eg,grepKEGG,eg2kegg)
KEGG <- goseq(pwf, gene2cat=kegg)

enriched.kegg <- KEGG$category[p.adjust(KEGG$over_represented_pvalue, method="BH")<.05]

allKEGGs <- stack(kegg)
allKEGG_sig <- allKEGGs[allKEGGs$values %in% enriched.kegg,]

allKEGG_sig2 <- allKEGG_sig[allKEGG_sig$ind %in% sig_genes,]

Where sig_genes are your significant genes.

Does this work?

score 1 · Answer 2 · 2017-01-17

1

Entering edit mode

Xavier ▴ 10

@xavier-12181

Last seen 7.3 years ago

I know see, everything seems to work now. Again, thank you so much for your help!

ADD COMMENT • link 7.3 years ago Xavier ▴ 10

score 0 · Answer 3 · 2017-01-17

0

Entering edit mode

Xavier ▴ 10

@xavier-12181

Last seen 7.3 years ago

Unfortunately this doesn't work. The 5th command ( "allKEGG_sig <- allKEGGs[allKEGGs$values %in% enriched.kegg,]") does not work as it gives no data. Any other solutions?

ADD COMMENT • link 7.3 years ago Xavier ▴ 10

0

Entering edit mode

Do you have enriched kegg pathways?

Try:

head(enriched.kegg)

You can also skip the enriched.kegg step of course, but you'll get pathways that are not significantly enriched...

kegg <- lapply(en2eg,grepKEGG,eg2kegg)
KEGG <- goseq(pwf, gene2cat=kegg)

allKEGGs <- stack(kegg)
allKEGG_sig <- allKEGGs[allKEGGs$values %in% KEGG$category,]

allKEGG_sig2 <- allKEGG_sig[allKEGG_sig$ind %in% sig_genes,]

Does this work?, if not please show some code and errors given by R. It's hard to blind code here.

ADD REPLY • link 7.3 years ago b.nota ▴ 360

0

Entering edit mode

# This was the "original" code:
kegg <- lapply(en2eg, grepKEGG, eg2kegg)
KEGG=goseq(pwf, gene2cat=kegg)
KEGG.sig <- KEGG[KEGG$over_represented_pvalue<0.05,]

# This is the code adjusted based on your input:
kegg <- lapply(en2eg, grepKEGG, eg2kegg)
KEGG=goseq(pwf, gene2cat=kegg)
allKEGGs <- stack(kegg)
KEGG.sig <- KEGG[KEGG$over_represented_pvalue<0.05,]
allKEGG_sig <- allKEGGs[allKEGGs$values %in% enriched.kegg,]
allKEGG_sig2 <- allKEGG_sig[allKEGG_sig$ind %in% sig_genes,]

# These are the significant KEGGs (So yes, there are significant categories):

> KEGG.sig
category over_represented_pvalue under_represented_pvalue numDEInCat numInCat
86 01100 0.004357014 0.9992979 7 1179
74 00785 0.005852647 0.9999894 1 3
35 00450 0.033955040 0.9994927 1 17
17 00230 0.048094216 0.9952265 2 172

# Up until command # 4 (KEGG.sig) everything seems to work. The 5th command (allKEGG_sig) gives howerever the following "error" when I checked if the command works:

> allKEGG_sig
[1] values ind
<0 rows> (or 0-length row.names)

Thank you so far for your help/efforts.

ADD REPLY • link 7.3 years ago Xavier ▴ 10

0

Entering edit mode

All right, I see two problems...

First, you'll need to adjust your p-values for multiple testing. The p-values that you use now are not corrected for this yet.

However, if you ignore this fact (which is not wise), your new script in line 5 still has the enriched.kegg part (which you didn't make in your script). So replace it with your KEGG.sig$category.

ADD REPLY • link 7.3 years ago b.nota ▴ 360

score 0 · Answer 4 · 2017-01-17

0

Entering edit mode

Xavier ▴ 10

@xavier-12181

Last seen 7.3 years ago

# Thanks, commend #5 now works:
allKEGG_sig <- allKEGGs[allKEGGs$values %in% KEGG.sig$category,]

# However, the last command does not work:
allKEGG_sig2 <- allKEGG_sig[allKEGG_sig$ind %in% sig_genes,]

# It gives the following error:
> allKEGG_sig2 <- allKEGG_sig[allKEGG_sig$ind %in% sig_genes,]
Error in allKEGG_sig$ind %in% sig_genes : object 'sig_genes' not found

ADD COMMENT • link 7.3 years ago Xavier ▴ 10

0

Entering edit mode

Yes, that's why I said in my first post:

"Where sig_genes are your significant genes."

I don't know how you called your list of significant genes that you used to make your pwf or gene.vector object.

ADD REPLY • link 7.3 years ago b.nota ▴ 360