About GOSeq GO and KEGG enrichment analysis
2
1
Entering edit mode
jackipchiho ▴ 10
@jackipchiho-8282
Last seen 9.3 years ago
Hong Kong

HI ALL,

I am working on non model species, and now I want to use GOseq to perform GO and KEGG enrichment analysis.

I have found a list of DEGs using DESeq2, FDR <= 0.05, |log2FC| => 1. However, some of those DEGs (say 50%) haven't associated with any GO and KEGG annotation. Will it affect the result a lots?

Besides, I found GO level from 1 to 15 in the GO annotation file. Do I need to input all those GO level, or can I only input relatively general GO level (say Level 2-6)? Because some of GO terms only have one or two associated genes, is that meaningful to include them. I have tried GOEAST and GAGE for enrichment analysis. Both of them will set a cut off the number of gene associated in GO term. And I am not sure GOseq can have gene set size option or not. And I am not sure lots of GO terms may affect the enrichment analysis and make the p values larger.

Many thanks,

Jack

 

goseq • 3.4k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

some of those DEGs (say 50%) haven't associated with any GO and KEGG annotation. Will it affect the result a lots?

Well, yes, of course it would be preferable to have annotation for all genes, but you can't do anything about it if the annotation doesn't exist. goseq will still perform a valid enrichment analysis for the genes that do have annotation.

Because some of GO terms only have one or two associated genes, is that meaningful to include them. 

It doesn't do any harm, except possibly from the point of view of multiple testing.

And I am not sure lots of GO terms may affect the enrichment analysis and make the p values larger.

The p-value that is obtained for each GO term depends only on that term. Including other GO terms will make no difference to the p-values.

ADD COMMENT
0
Entering edit mode
jackipchiho ▴ 10
@jackipchiho-8282
Last seen 9.3 years ago
Hong Kong

Dear Gordon,

Thanks for your explanation. 

If the nonannotated DEGs will affect the overall results, should I remove them? or should I separate the up or down regulated genes to run GOseq again? as all DEGs only marked with "1" in DEG input file.

For p-value, I tried to run the analysis with separated three GO categories (BP, CC and MF) or ALL GO terms at once. Analysis ALL GO terms have given higher p-value and FDR then separated GO categories. That's why I have this question.

Many thanks,

Jack

 

ADD COMMENT
0
Entering edit mode

I believe goseq will ignore unannotated genes by default.

As far as separating up- and down-regulated genes, I would try both ways. There is some evidence (e.g. Guo et al, 2014) that analyzing the up- and down-regulated genes separately can be beneficial.

ADD REPLY
0
Entering edit mode

Thanks Keith, yes, I also read that paper ytd. I will run the analysis again today

ADD REPLY

Login before adding your answer.

Traffic: 946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6