GOseq package: Which genes are included in the "numDEInCat"?
4
0
Entering edit mode
Xavier ▴ 10
@xavier-12181
Last seen 7.3 years ago

Hi,

I've performed KEGG pathway analyses using the GOseq package and now I was wondering how do I know which genes were included in the "numDEInCat"? See below for example: 

Category Over_represented_pvalue Under_represented_pvalue numDEInCat numInCat pathway
01100 .. ... 7 1179 ...
00785 ... ... 1 3 ...
00450 ... ... 1 17 ...
00230 ... ... 2 172 ...

Thus in this example - output, when looking at category 01100, 7 DEGs were included in the pathway. How can I find the ensemble ID's of these genes? I can't find anything regarding this in the manual (http://bioconductor.org/packages/release/bioc/vignettes/goseq/inst/doc/goseq.pdf)

Anyone an idea? 

Best,

Xavier

 

GOseq KEGG • 2.4k views
ADD COMMENT
1
Entering edit mode
b.nota ▴ 360
@bnota-7379
Last seen 3.6 years ago
Netherlands

You can try the following:

kegg <- lapply(en2eg,grepKEGG,eg2kegg)
KEGG <- goseq(pwf, gene2cat=kegg)

enriched.kegg <- KEGG$category[p.adjust(KEGG$over_represented_pvalue, method="BH")<.05]

allKEGGs <- stack(kegg)
allKEGG_sig <- allKEGGs[allKEGGs$values %in% enriched.kegg,]

allKEGG_sig2 <- allKEGG_sig[allKEGG_sig$ind %in% sig_genes,]

Where sig_genes are your significant genes.

Does this work?

 

 

 

ADD COMMENT
1
Entering edit mode
Xavier ▴ 10
@xavier-12181
Last seen 7.3 years ago

I know see, everything seems to work now. Again, thank you so much for your help! 

ADD COMMENT
0
Entering edit mode
Xavier ▴ 10
@xavier-12181
Last seen 7.3 years ago

Unfortunately this doesn't work. The 5th command ( "allKEGG_sig <- allKEGGs[allKEGGs$values %in% enriched.kegg,]") does not work as it gives no data. Any other solutions?

ADD COMMENT
0
Entering edit mode

Do you have enriched kegg pathways?

Try:

head(enriched.kegg)

You can also skip the enriched.kegg step of course, but you'll get pathways that are not significantly enriched...

kegg <- lapply(en2eg,grepKEGG,eg2kegg)
KEGG <- goseq(pwf, gene2cat=kegg)

allKEGGs <- stack(kegg)
allKEGG_sig <- allKEGGs[allKEGGs$values %in% KEGG$category,]

allKEGG_sig2 <- allKEGG_sig[allKEGG_sig$ind %in% sig_genes,]

Does this work?, if not please show some code and errors given by R. It's hard to blind code here.

 

ADD REPLY
0
Entering edit mode
# This was the "original" code:
kegg <- lapply(en2eg, grepKEGG, eg2kegg)
KEGG=goseq(pwf, gene2cat=kegg)
KEGG.sig <- KEGG[KEGG$over_represented_pvalue<0.05,]

# This is the code adjusted based on your input:
kegg <- lapply(en2eg, grepKEGG, eg2kegg)
KEGG=goseq(pwf, gene2cat=kegg)
allKEGGs <- stack(kegg)
KEGG.sig <- KEGG[KEGG$over_represented_pvalue<0.05,]
allKEGG_sig <- allKEGGs[allKEGGs$values %in% enriched.kegg,]
allKEGG_sig2 <- allKEGG_sig[allKEGG_sig$ind %in% sig_genes,]

# These are the significant KEGGs (So yes, there are significant categories):

> KEGG.sig
   category over_represented_pvalue under_represented_pvalue numDEInCat numInCat
86    01100             0.004357014                0.9992979          7     1179
74    00785             0.005852647                0.9999894          1        3
35    00450             0.033955040                0.9994927          1       17
17    00230             0.048094216                0.9952265          2      172

# Up until command # 4 (KEGG.sig) everything seems to work. The 5th command (allKEGG_sig) gives howerever the following "error" when I checked if the command works:

> allKEGG_sig
[1] values ind   
<0 rows> (or 0-length row.names)

Thank you so far for your help/efforts.

ADD REPLY
0
Entering edit mode

All right, I see two problems...

First, you'll need to adjust your p-values for multiple testing. The p-values that you use now are not corrected for this yet.

However, if you ignore this fact (which is not wise), your new script in line 5 still has the enriched.kegg part (which you didn't make in your script). So replace it with your KEGG.sig$category.

ADD REPLY
0
Entering edit mode
Xavier ▴ 10
@xavier-12181
Last seen 7.3 years ago

# Thanks, commend #5 now works:
allKEGG_sig <- allKEGGs[allKEGGs$values %in% KEGG.sig$category,]

# However, the last command does not work:
allKEGG_sig2 <- allKEGG_sig[allKEGG_sig$ind %in% sig_genes,]

# It gives the following error:
> allKEGG_sig2 <- allKEGG_sig[allKEGG_sig$ind %in% sig_genes,]
Error in allKEGG_sig$ind %in% sig_genes : object 'sig_genes' not found

ADD COMMENT
0
Entering edit mode

Yes, that's why I said in my first post:

"Where sig_genes are your significant genes."

I don't know how you called your list of significant genes that you used to make your pwf or gene.vector object.

ADD REPLY

Login before adding your answer.

Traffic: 657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6