Hello,
I encounter a problem using goSlim function in the package GSEAbase.
I have a set of annotations (pasted below). When I use the goSlim function in order to have the counts of appearance of each Go Slim plant ontology, it seems to me that the counts are off.
I noticed it because I am interested in one perticular ontology which is "GO:0005618" (cell wall), appearing two times in my annotation file but which is counted only one. Is there something I am missing here?
My annotation file and code are below:
Here is my annotation file, named annot7:
--------------------------------------------------------------------------------------------
Contig2390 GO:0000902 expansin-A1 isoform X1 Contig2390 GO:0005618 Contig2390 GO:0071554 Contig2390 GO:0005576 Contig2390 GO:0040007 Contig7915 GO:0007010 villin-3-like isoform X1 Contig7915 GO:0022607 Contig7915 GO:0008092 Contig10367 GO:0005975 endoglucanase 2 Contig10367 GO:0009056 Contig10367 GO:0005575 Contig10367 GO:0016798 Contig10367 EC:3.2.1.4 Contig21476 GO:0016746 O-acyltransferase WSD1-like Contig21476 GO:0006629 Contig21476 GO:0009058 Contig21476 GO:0005886 Contig21476 EC:2.3.1.20 Contig21476 EC:2.3.1.75 Contig71769 GO:0005975 plant invertase pectin methylesterase inhibitor superfamily Contig71769 GO:0009056 Contig71769 GO:0005618 Contig71769 GO:0071554 Contig71769 GO:0030234 Contig71769 EC:3.1.1.11 Contig71976 GO:0005575 glucomannan 4-beta-mannosyltransferase 9-like Contig71976 GO:0003674 Contig74515 GO:0005975 probable rhamnogalacturonate lyase B Contig74515 GO:0003674
-----------------------------------------------------------------------------------------------------------------
Here is my code:
--------------------------------------------------------------------------------------------------------------------
colnames(annot7)=c("contig","GO","gene")
library("GSEABase")
#database
fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase")
slim <- getOBOCollection(fl)
setDEGOs7=as.character(annot7$GO)
currentCollecSet7=GOCollection(setDEGOs7)
# Here, length(which(currentCollecSet7=="GO:0005618")) is 2
freqGO_7_MF=goSlim(currentCollecSet7, slim, "MF")#or BP or CC
freqGO_7_BP=goSlim(currentCollecSet7, slim, "BP")
freqGO_7_CC=goSlim(currentCollecSet7, slim, "CC")
#and this is what I get:
> freqGO_7_CC
Count Percent Term
GO:0005575 4 36.363636 cellular_component
GO:0005576 1 9.090909 extracellular region
GO:0005578 0 0.000000 proteinaceous extracellular matrix
GO:0005615 0 0.000000 extracellular space
GO:0005618 1 9.090909 cell wall
GO:0005622 0 0.000000 intracellular
GO:0005623 2 18.181818 cell
GO:0005634 0 0.000000 nucleus
GO:0005635 0 0.000000 nuclear envelope
GO:0005654 0 0.000000 nucleoplasm
GO:0005730 0 0.000000 nucleolus
GO:0005737 0 0.000000 cytoplasm
GO:0005739 0 0.000000 mitochondrion
GO:0005764 0 0.000000 lysosome
GO:0005768 0 0.000000 endosome
GO:0005773 0 0.000000 vacuole
GO:0005777 0 0.000000 peroxisome
GO:0005783 0 0.000000 endoplasmic reticulum
GO:0005794 0 0.000000 Golgi apparatus
GO:0005829 0 0.000000 cytosol
GO:0005840 0 0.000000 ribosome
GO:0005856 0 0.000000 cytoskeleton
GO:0005886 1 9.090909 plasma membrane
GO:0009536 0 0.000000 plastid
GO:0009579 0 0.000000 thylakoid
GO:0016020 1 9.090909 membrane
GO:0030312 1 9.090909 external encapsulating structure
----------------------------------------------------------------------------------------------------------------------------
Thanks for reading me!

Thanks a lot! It works now and the results seems more logical.