Search
Question: About GOSeq GO and KEGG enrichment analysis
1
gravatar for jackipchiho
3.3 years ago by
jackipchiho10
Hong Kong
jackipchiho10 wrote:

HI ALL,

I am working on non model species, and now I want to use GOseq to perform GO and KEGG enrichment analysis.

I have found a list of DEGs using DESeq2, FDR <= 0.05, |log2FC| => 1. However, some of those DEGs (say 50%) haven't associated with any GO and KEGG annotation. Will it affect the result a lots?

Besides, I found GO level from 1 to 15 in the GO annotation file. Do I need to input all those GO level, or can I only input relatively general GO level (say Level 2-6)? Because some of GO terms only have one or two associated genes, is that meaningful to include them. I have tried GOEAST and GAGE for enrichment analysis. Both of them will set a cut off the number of gene associated in GO term. And I am not sure GOseq can have gene set size option or not. And I am not sure lots of GO terms may affect the enrichment analysis and make the p values larger.

Many thanks,

Jack

 

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by jackipchiho10
1
gravatar for Gordon Smyth
3.3 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

some of those DEGs (say 50%) haven't associated with any GO and KEGG annotation. Will it affect the result a lots?

Well, yes, of course it would be preferable to have annotation for all genes, but you can't do anything about it if the annotation doesn't exist. goseq will still perform a valid enrichment analysis for the genes that do have annotation.

Because some of GO terms only have one or two associated genes, is that meaningful to include them. 

It doesn't do any harm, except possibly from the point of view of multiple testing.

And I am not sure lots of GO terms may affect the enrichment analysis and make the p values larger.

The p-value that is obtained for each GO term depends only on that term. Including other GO terms will make no difference to the p-values.

ADD COMMENTlink written 3.3 years ago by Gordon Smyth35k
0
gravatar for jackipchiho
3.3 years ago by
jackipchiho10
Hong Kong
jackipchiho10 wrote:

Dear Gordon,

Thanks for your explanation. 

If the nonannotated DEGs will affect the overall results, should I remove them? or should I separate the up or down regulated genes to run GOseq again? as all DEGs only marked with "1" in DEG input file.

For p-value, I tried to run the analysis with separated three GO categories (BP, CC and MF) or ALL GO terms at once. Analysis ALL GO terms have given higher p-value and FDR then separated GO categories. That's why I have this question.

Many thanks,

Jack

 

ADD COMMENTlink written 3.3 years ago by jackipchiho10

I believe goseq will ignore unannotated genes by default.

As far as separating up- and down-regulated genes, I would try both ways. There is some evidence (e.g. Guo et al, 2014) that analyzing the up- and down-regulated genes separately can be beneficial.

ADD REPLYlink written 3.3 years ago by Keith Hughitt120

Thanks Keith, yes, I also read that paper ytd. I will run the analysis again today

ADD REPLYlink written 3.3 years ago by jackipchiho10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 404 users visited in the last hour