Search
Question: topGO enrichment using ensembl gene list
0
10.6 years ago by
Hi Julien, On Mon, Mar 31, 2008 at 6:53 AM, Julien Roux <julien.roux at="" unil.ch=""> wrote: > Hello list, > > I am using the package "topGO" to analyse GO enrichment of gene sets: > > My genes are ensembl IDs and are not taken from a microarray, so I had > to feed "topGOdata" with a gene2GO list. > (see > http://thread.gmane.org/gmane.science.biology.informatics.conductor /14627) > I construct that list by mapping all ensembl IDs to GO IDs using the > package "biomaRt". > Then I proceed with my analysis: > > > GOdata <- new("topGOdata", ontology = "MF", allGenes = selectedList, > description = "Ensembl GO enrichment", annot = annFUN.gene2GO, gene2GO = > gene2GO) > > Do you confirm this approach is correct? > to make things clear. To perform the analysis with topGO you need the followings: 1. A list of genes/proteins/probe-set and their score. In the case you don't have a score for the proteins, as in your case, a list of genes which are of interest can be specified. The list of interesting genes must be a subset of the whole list of genes (the univers). 2. A mapping between genes and GO terms. This can be provided either as "genes to GO trems" or as "GO terms to genes". The mapping needs to be provided in form of a list (indexed either by the genes or by GO terms respectively). topGO provides you with three annotation functions: annFUN.hgu, annFUN.gene2GO, annFUN.GO2genes to work with the different type of annotations. For example if you have annotation from GO terms to genes, the list should look like: > str(myGO2genes) List of 852 $GO:0006629: chr [1:2] "71845" "4259"$ GO:0008104: chr [1:2] "52" "2575" $GO:0019673: chr [1:2] "3339" "4736"$ GO:0015662: chr [1:3] "232839" "3697" "06092" ........................................................ Then you can build a topGOdata object with: > GOdata <- new("topGOdata", ontology = "MF", allGenes = selectedList, annot = annFUN.GO2genes, ## the new annotation function GO2genes = myGO2genes) ## the GO to gene ID's dataset If you have annotations from genes to GOs, then the list should look like: > str(myGenes2GO) List of 24354 $35214: chr [1] "GO:0006329"$ 118104: chr [1:3] "GO:0006329", "GO:0045463", "GO:0035564" $15273: chr [1:2] "GO:0006329", "GO:0045463"$ 15662: chr [1:2] "GO:0005726", "GO:0057523" ........................................................ It is sufficient for the mapping from genes to GO to contain only the most specific GO annotations. Then build a topGOdata object with: GOdata <- new("topGOdata", ontology = "MF", allGenes = selectedList, annot = annFUN.gene2GO, ## the new annotation function gene2GO = myGenes2GO) ## the gene ID to GOs dataset > I also had several question concerning topGO: > - Are the p-value in topGO corrected for multiple testing (FDR...)? My > guess is that they are not due to a problem of independence... No, the p-values returned by the getSigGroups() function are not corrected for multiple testing. > - Are there some differences between Fisher exact test (topGO) and > Hypergeometric test (GOstats). If yes, why did the two packages make > different choices? No, they are basically the same test. > - It is not clear to me what the Kolmogorov-Smirnov is testing? > Especially in my case where I don't provide scores associated to my genes... You can't use Kolmogorov-Smirnov test if you don't have a ranking of the genes. The test is present in the package since there many cases in which people have a ranking of the genes and want to perform such a test. Also in the new version of the package you will find many other tests that can be used, like globaltest, t-test, etc. > - Is there a way to test separately over/under representation of GO > categories? Yes, one can test the underrepresentation (deplition?) of GO terms. What one needs to do is the change the direction of the test. You can define the test statistic as follows: ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- -------------------------- ## define the test statistic which will detect underrepresentation if(!isGeneric("GOFisherTestUnder")) setGeneric("GOFisherTestUnder", function(object) standardGeneric("GOFisherTestUnder")) setMethod("GOFisherTestUnder", "classicCount", function(object) { contMat <- contTable(object) if(all(contMat == 0)) p.value <- 1 else p.value <- fisher.test(contMat, alternative = "less")\$p.value ## "greater" is for over-, "less" for under-, and "two-sided" is for both alternatives return(p.value) }) ## run the algorithms test.stat <- new("classicCount", testStatistic = GOFisherTestUnder, name ="Fisher test underrepresentation") resultFis <- getSigGroups(GOdata, test.stat) test.stat <- new("elimCount", testStatistic = GOFisherTestUnder, name ="Fisher test underrepresentation") resultElimFis <- getSigGroups(GOdata, test.stat) GenTable(GOdata, Fis = resultFis, Elim = resultElimFis) GO.ID Term Annotated Significant Expected Rank in Elim Fis Elim 1 GO:0006793 phosphorus metabolic process 983 1 8.29 518 0.0016 1.0000 2 GO:0006796 phosphate metabolic process 983 1 8.29 1 0.0016 0.0016 3 GO:0043687 post-translational protein modification 1253 3 10.56 2 0.0046 0.0046 4 GO:0048523 negative regulation of cellular process 1116 3 9.41 3 0.0119 0.0119 5 GO:0048519 negative regulation of biological proces... 1162 4 9.79 4 0.0262 0.0262 6 GO:0043412 biopolymer modification 1476 6 12.44 237 0.0262 0.8485 7 GO:0006464 protein modification process 1436 6 12.10 299 0.0327 0.9107 8 GO:0007242 intracellular signaling cascade 1307 6 11.02 5 0.0641 0.0641 9 GO:0019538 protein metabolic process 2690 17 22.67 121 0.1002 0.6519 10 GO:0006091 generation of precursor metabolites and ... 446 1 3.76 6 0.1047 0.1047 ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- -------------------------------------------- Hope this helps, Adrian