Question

About GOSeq GO and KEGG enrichment analysis

1

Entering edit mode

jackipchiho ▴ 10

@jackipchiho-8282

Last seen 8.6 years ago

Hong Kong

HI ALL,

I am working on non model species, and now I want to use GOseq to perform GO and KEGG enrichment analysis.

I have found a list of DEGs using DESeq2, FDR <= 0.05, |log2FC| => 1. However, some of those DEGs (say 50%) haven't associated with any GO and KEGG annotation. Will it affect the result a lots?

Besides, I found GO level from 1 to 15 in the GO annotation file. Do I need to input all those GO level, or can I only input relatively general GO level (say Level 2-6)? Because some of GO terms only have one or two associated genes, is that meaningful to include them. I have tried GOEAST and GAGE for enrichment analysis. Both of them will set a cut off the number of gene associated in GO term. And I am not sure GOseq can have gene set size option or not. And I am not sure lots of GO terms may affect the enrichment analysis and make the p values larger.

Many thanks,

Jack

goseq • 3.0k views

ADD COMMENT • link 8.7 years ago jackipchiho ▴ 10

score 1 · Answer 1 · 2015-08-06

some of those DEGs (say 50%) haven't associated with any GO and KEGG annotation. Will it affect the result a lots?

Well, yes, of course it would be preferable to have annotation for all genes, but you can't do anything about it if the annotation doesn't exist. goseq will still perform a valid enrichment analysis for the genes that do have annotation.

Because some of GO terms only have one or two associated genes, is that meaningful to include them.

It doesn't do any harm, except possibly from the point of view of multiple testing.

And I am not sure lots of GO terms may affect the enrichment analysis and make the p values larger.

The p-value that is obtained for each GO term depends only on that term. Including other GO terms will make no difference to the p-values.

score 0 · Answer 2 · 2015-08-06

0

Entering edit mode

jackipchiho ▴ 10

@jackipchiho-8282

Last seen 8.6 years ago

Hong Kong

Dear Gordon,

Thanks for your explanation.

If the nonannotated DEGs will affect the overall results, should I remove them? or should I separate the up or down regulated genes to run GOseq again? as all DEGs only marked with "1" in DEG input file.

For p-value, I tried to run the analysis with separated three GO categories (BP, CC and MF) or ALL GO terms at once. Analysis ALL GO terms have given higher p-value and FDR then separated GO categories. That's why I have this question.

Many thanks,

Jack

ADD COMMENT • link 8.7 years ago jackipchiho ▴ 10

0

Entering edit mode

I believe goseq will ignore unannotated genes by default.

As far as separating up- and down-regulated genes, I would try both ways. There is some evidence (e.g. Guo et al, 2014) that analyzing the up- and down-regulated genes separately can be beneficial.

ADD REPLY • link 8.7 years ago Keith Hughitt ▴ 180

0

Entering edit mode

Thanks Keith, yes, I also read that paper ytd. I will run the analysis again today

ADD REPLY • link 8.7 years ago jackipchiho ▴ 10